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FOREWORD 



In late 1986, the Social Science Education Consortium Board of Directors launched the SSEC 
Monograph Program Recognizing that sound, scholarly work often goes u.moticed, the Board decided to 
take the Initiative In continuing the mission for which the SSEC has become known -offering 'cutting edge" 
Information to the social studies profession. Thus, the purpose of the SSEC Monograph Program is to pub- 
lish scholarly monographs In the field of social studies/social science education that make a significant con- 
tribution to the profession It Is with this purpose In mind that we offer Toward Improving Research in Social 
Studies Education as the first SSEC monograph. 

In 1985 we were painfully reminded by several scholars of a host of defects In social studies research - 
lack of repllcabllity, lack of Innovative methodology, lack of external and Internal validity, and Inappropriate 
application of statistical techniques to name a few. Some scholars even used the terms "trivial" and "mind- 
less" to characterize the research In our field (see Stanley 1985). Jack Fraenkel and Norm Wallen took the 
criticism of research In social studies seriously. They decided to examine systematically actual studies. In 
preparing this monograph, they used rigorous criteria to analyze 118 studies published In three major sour- 
ces- Theory and Research in Social Studies (TRSE), the Journal of Social Studies Research (JSSR), and 
the research section of Social Education (SE)-for the years 1979 through 1986. They take their analysis a 
step farther, providing helpful suggestions for Improvement to professors, graduate students, and classroom 
teachers who may be planning to conduct research. To bring their analytical criteria to an operational level, 
they critique one study In detail, Illustrating both Its strengths and weaknesses. Finally, the/ offer practical 
suggestions to classroom teachers who may become Involved In school and classroom research. 

This Is not a volume that should be purchased by only a few scholars to be put In the research section 
of a personal library. The monograph Is Important to any social studies professional who wants to avoid the 
research mistakes of the past, or who wishes to critique a research study or proposal. The standards for 
judging research are an Important guide for any committed social studies educator, whether or not they are 
engaged In research. Toward Improving Research in Social Studies Education is "must" reading for all so- 
cial studies educators. 

James E. Davis, Chair 
SSEC Board of Directors 
Publications Committee 



Do you have an unpublished study or treatise on social studies/social science education that you 'ould 
like to have considered for the new SSEC monograph series? If so, send a proposal and outline to: 



SSEC Boa-d of Directors 
c/o Marcia Hutson 

855 Broadway 
Boulder, CO 80302 
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INTRODUCTION 



Criticisms of the nature and quality of education- 
al research in general continue to appear in the 
professional literature. More and more frequently, 
one sees arguments or proposals for changing not 
only the nature of research but also the standards 
by which it is judged There have been arguments 
to move toward qualitative (as opposed to quantita- 
tive) analyses, to integrate quantitative and qualita- 
tive methods of inquiry, even to consider and 
develop new methods of inquiry altogether (Al- 
lender 1986). Researchers have been urged to 
place less emphasis on external validity (Mook 
1983), to decrease their use of inferential statistics 
(Carver 1978), to concentrate on common-sense in- 
terpretations and replication to promote under- 
standing (Stake 1978), to consider introspection 
and speculation as valid scientific methods (Bakan 
1975), to conduct unrationalized (i.e., unplanned) 
studies (Larkins and Puckett 1983), and even to 
consider art as a model for scientific investigation 
(Eisner 1981). 

Research in social studies education has not es- 
caped these criticisms and suggestions. Social 
studies research has been criticized for sampling 
bias, inappropriate methodologies, incorrect or in- 
appropriate use of statistics, weak or ill-defined 
treatments, and lack of replication and/or longi- 
tudinal follow-up. Many social studies research 
questions are said to be trivial. Control and ex- 
perimental groups are seldom equivalent. Haw- 
thorne or John Henry effects contaminate findings. 
Aptitude-treatment interactions are almost totally ig- 
nored. Instruments are poorly designed, frequently 
lacking validity or reliability. The durability of ef- 
fects, when any are detected, is almost never as- 
sessed. Statistical procedures are frequently inap- 
propriate Legitimate generalizability is almost non- 
existent (eg., see Cornbleth 1982, Fraenkel 1987, 
Larkins and McKinney 1980; Leming 1985, Martorel- 
la 1977; Nelson and Shaver 1985, Newmann 1985, 
Shaver 1979b; Shaver and Norton 1980; Wallen 
1983; Wallen and Fraenkel 1988). 

In the late 1970s, Shaver and Norton (1980) 
reported that only a small percentage of 53 articles 
in two social studies journals involved random sam- 



pling or assignment, replication of previous work, 
or limited their conclusions due to shortcomings in 
accessible populations and samples. Intrigued by 
their findings, we decided to investigate whether 
current social studies research efforts continue to 
suffer from these (or other) faults. We wanted to 
see if past criticisms were true of current research 
efforts as well. 

Accordingly, we reviewed the research (with cer- 
tain exceptions) reported in Theory and Research 
in Social Education (TRSE), The Journal of Social 
Studies Research (JSSR), and the research section 
of Social Education (SE) for the years 1979-1986. 
We wanted to look a? a number of characteristics 
in addition to those which Shaver and Norton 
studied, however. T his monograph presents the 
results of our work 

The monograph contains six chapters. In the 
first, we critique all of the empirical studies 
published in TRSE, JSSR, and the research section 
of SE between 1979 and 1986. In Chapter 2, we 
offer some observations, based on the analysis in 
Chapter 1, about the nature of current social 
studies research. 

In Chapter 3, we offer some ideas about how 
the quality of social studies research might be im- 
proved. We direct the remarks in this section to 
three distinct groups of social studies educators: 
(1) professors who direct master's theses o; doc- 
toral dissertations, but who do not teach courses 
in educational research (2) graduate students who 
intend to do research, and (3) classroom teachers 
who have an interest in research. 

In Chapter 4, we evaluate a single study in 
depth, using the same criteria discussed in Chapter 
1 We analyze both the weaknesses and strengths 
of this study in order to illustrate not only those pro- 
cedures we believe lesearchers should avoid, but 
also those they should employ in social studies re- 
search. 

In Chapter 5, we offer some ideas about how 
classroom teachers of social studies might become 
more involved in research in their classiooms and 
schools. Chapter 6 lists the studies reviewed. 
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CHAPTER 1 
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Overview of the Study 

We reviewed all of the research published from 
1979 through 1986 in Theory and Research in So- 
cial Education (volumes 7 through 15), The Journal 
of Social Studies Research (volumes 3 through 
1C), and the research section of Social Education 
(volumes 43 through 49), with certain exceptions 
(as noted below). The instrument shown in Figure 
1 (see page 4) was used to analyze and evaluate 
the studies. 

Articles falling in one or more of the following 
categories were not analyzed: 

• Arguments or position papers, in which the 
author(s) argued that a particular position or 
program of some sort should be adopted or 
considered by the social studies profession. 

e Historical studies, in which the author(s) 
described, reviewed, and/or analyzed some 
aspect of social studies education in the past. 

• Content analyses, in which the author(s) 
analyzed the contents of textbooks or other 
types of social studies documents. 

• Philosophical inquiries, in which the author(s) 
presented rationale statements of some sort or 
delvod into the meaning of various terms used 
by social studies professionals. 



• Methodological proposals, in which the 
author(s) proposed *hat a certain type of 
method be utilized by social studies teachers 
or researchers 

• Literatu e reviews, in which the author(s) 
presented a summary of previous research 
and/or commentary on a topic or issue. 

• Reaction papers, in which the author(s) 
reacted to critiques of their work that had ap- 
peared in an earlier issue of the journal. 

• Validity cr instrument development studies. 

• Book reviews. 

Of some 133 articles published In TRSE for this 
period, 87 (65 percent) fell into the above 
categories. Of some 73 articles published in the 
JSSR and another 33 published in the research sec- 
tion of SE for the same period, 18 (26 percent) and 
16 (49 percent), respectively, fell into the above 
categories. These types of articles were not 
reviewed because they did not lend themselves to 
f he kind of analysis we performed. We Intend, there- 
fore, no implication of the quality of these artlc'es 
in any way by their omission. 

A total of 118 articles in the three journals were 
reviewed. Table 1 gives a breakdown of the studies 
by type. The categories listed in the table were 
rWined as follows: 







Table 1 








BREAKDOWN OF STUDIES BY TYPE 




Type of Study 


TRSE 


JSSR 


SE 


Total 


Pre-experiments 


0 (0%) 


4 (6%) 


2 (11%) 


6 (5%) 


True experiments 


7 (15%) 


11 (17%) 


5 (26%) 


23 (18%) 


Quasi-experiments 


7 (15%) 


9 (14%) 


8 (42%) 


24 (19%) 


Correlational studies 


9 (19%) 


10 (16%) 


0 (0%) 


19 (15%) 


Surveys 


9 (19%) 


23 (37%) 


3 (16%) 


35 (27%) 


Interviews 


6 (13%) 


1 (2%) 


1 (5%) 


8 (6%) 


Causal-comparisons 


0 (0%) 


3 (5%) 


0 (0%) 


3 (2%) 


Ethnoaraphies 


9 (19%) 


2 (3%) 


0 (0%) 


11 (8%) 


Totals 


n = 47 a 


ll = 63 a 


n = 19 a 


129 a 


a These totals exceed the actual number of studies reviewed because, in a few instances, two 


methodologies were used in the same study. 

----- 
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Figure 1. Categories used to evaluate social studies research 



1. Type of Research 

a. Experimental 

1) Pre 

2) True 

3) Quasi 

b. Correlational 

c. Survey 

d. Interview 

e. Causal-comparative 

f. Ethnographic 

2. Justification 

a. No mention of justification 

b. Explicit argument made with regard to 
worth of study 

c. Worth of study is implied 

d. Any ethical considerations overlooked? 

3. Clarity 

a. Focus clear? 

b. Variables clear? 

1) Initially 

2) Eventually 

3) Never 

c. Is treatment in intervention studies made 
explicit? 

d. Is there a hypothesis? 

1) No 

2) Yes: Explicitly stated 

3) Yes: Clearly implied 

4. Are Key TeTiJS Defined? 

a. No 

b. Operationally 

c. Con^titutively 

d. Clear in context of study 

5. Sample 

a. Type 

1) Random selection 

2) Representation based un argument 

3) Convenience 

4) Volunteer 

5) Can't tell 

b. Was sample adequately described? 
(1 = high; 5 = low) 

c. Size of sample (n) 

6. Internal Validity 

a. Possible alternative explanations for 
outcomes obtained 

1) History 

2) Maturation 

3) Mortality 

4) Selection bias/Subject characteristics 

5) Pretest effect 

6) Regression effect 



7) Instrumentation 

8) Hawthorne or John Henry effect 

9) Order effect 

b Threats discussed and clarified? 

c. Was it clear that the treatment received an 

adequate trial'? (in intervention studies) 
d Was length of time of treatment sufficient? 

7. Instrumentation 

a Reliability 

1) Empirical check made 9 

2) If yes, was reliability adequate for 
study 9 

b Validity 

1) Empirical check made? 

2) If yes, type- 

a) Content 

b) Concurrent 

c) Construct 

8. External Validity 

a Discussion of population generalizability 

1) Appropriate 

a) Explicit reference to defensible 
target population 

b) Appropriate caution expressed 

2) Inappropriate 

a) No mention of generalizability 

b) Explicit reference to indefensible 
target population 

b Discussion of ecological generalizability 

1) Appropriate 

a) Explicit reference to defensible 
settings (subject matter, materials, 
physical conditions, personnel, 
etc.) 

b) Appropriate caution expressed 

2) Inappropriate 

a) No mention of generalizability 

b) Explicit reference to indefensible 
settings 

9. Were Results and Interpretations Kept 
Distinct? 

10. Data Analysis 

a. Descriptive statistics? 

1) Correct technique? 

2) Correct interpretation? 

b. Inferential statistics 9 

1) Correct technique 9 

2) Correct interpretation? 

11. Do Data Justify Conclusions? 

12. Were Outcomes of Study Educationally 
Significant? 

13. Relevance of Citations 
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• Pre-experiments. We use this label to refer to 
any of the three types of "weak" research 
designs first described by Campbell and Stan- 
ley (1963). the one-shot case study, the one- 
group pretest-posttest design, ™d the static- 
group comparison design As suggested by 
Stouffer almost four decades ago, studies 
employing such designs have such little con- 
trol that they have almost no scientific value 
(Stouffer 1950) 

• True experiments. Two or more groups of 
subjects receiving different treatments were 
compared in some way. Random assignment 
of subjects *to treatment and control groups 
was assured. Administration of the treatment 
was controlled by the researcher. 

. Quasi-experiments. Two or more groups of 
subjects were compared in some way. Ran- 
dom assignment of subjects to treatment and 
control groups did not occur. Administration of 
the treatment variable may or may not have 
been controlled by the researcher. 

• Correlational studies. The scores of one 
group of subjects on two differed measures 
were correlated. Such subsequent ana'yses as 
multiple regresoion or path analysis may have 
been performed 

• Surveys. A written questionnaire or test was 
administered, either by mail or in person, to 
one or more groups of subjects. No treatment 
was involved. The responses of the subjects 
were reported. 

• Intoriews. An interview schedule was 
prepared and administered orally (under the su- 
pervision of the researcher) to one or more 
groups of subjects. No treatment was involved. 
The subjects' responses to the questions were 
reported. 

• Causal-comparisons. Two or more groups dif- 
fering in known ways were compared on one 
or more variables. The intent was to explore 
possible causation between group member- 
ship and the other variable(s). 

• Ethnographies. The daily activities of one or 
more individuals were studied in naturalistic 
settings. These activities, and manner of per- 
forming same, were described in detail. Case 
studies, involving only a single individual, were 
included in this category 

We acknowledge that this typology is imperfect; 
some studies, for example, involved more than one 
methodology. We decided to classify a study, there- 
fore, according to the method or methods used to 
study the relationships or issues of interest, it more 
than one methodology was used, we counted 
both. We did not classify those studies that used 



dialysis of covariance under "Correlational," 
however, since the use of correlation is an adjunct 
to the question of inteiest - th* comparison of 
means. Furthermore, although ethnographic re- 
search may, and often does, incorporate inteiview 
procedures, we did not count this under Inter- 
views" since we believe this is generally under- 
stood. 

Procedures 

The instrument used for the analysis is shown in 
Figure 1 The categories listed therein were defined 
as follows. 

1 Type of research -see discussion above. 

2. Justification of study - the degree to which 
the worth of the study was explicitly argued for 
and/or defended. We also looked to see if there 
were any ethical considerations involved (i.e., 
whether there might be any physical or psychologi- 
cal harm to the subjects) and if so, whether the 
author(s) took such into account. 

3. Clarity -the degree to which the study was 
clear. The concern here was with the focus of the 
study- its purpose and direction, and the degree 
to which (and when) the author(s) identified the 
variables they were investigating. We also looked 
to see if, in intervention studies, the exact nature of 
the treatment was made explicit and if so, when. 
Finally, any hypotheses that existed were identified, 
and the degree to which they were made explicit 
was assessed. 

4. Definitions-the degree to which important 
terms in the study were defined, and how. 

5. Sample -the type, s»ze, and adequacy of 
description of the subjects involved in the study. 

6. Internal validity —the number of plausible al- 
ternative explanations we identified for any 
reported outcomes and the extent to which these 
alternatives were identified and discussed by the 
author(s). We also considered whether it was clear 
that a treatment (in the intervention studies) actual- 
ly occurred and when it did, whether the length of 
time of the treatment could be considered suffi- 
cient to produce the effect(s) intended. 

7 Instrumentation- the degree to which any 
and all instruments used were demonstrably reli- 
able and/or valid We considered in particular 
whether the investigator(s) conducted ar.y form of 
reliability and/or validity check of the instruments 
used and, if so, whether these checks were ade- 
quate for their purposes. 

3, External validity -the extent to which the find- 
ings of the study were generalizable beyond the 
particular sample studied Considerations here in- 
cluded both population and ecological generaliza- 




biiity, when and where the author(s) generalized ap- 
propriately (and, if so, to whom), when and where 
they did not, and when they couid not, whether 
they explained why. 

9. Distinction between results and con- 
clusions - the extent to which the author(s) clearly 
differentiated between their findings (empirical 
data) and the conclusions they arrived at based on 
their findings (subjective opinion). 

10. Data analysis -correct, and appropriate, 
use and interpretation of both descriptive and in- 
ferential statistics. 

11. Legitimacy of conclusions- whether limita- 
tions raised crucial questions about the con- 
clusions drawn. 

12. Educational significance of the study -our 
judgment of the importance of the study in practi- 
cal or theoretical, as opposed to statistical, terms. 

13. Relevance of citations -the degree to 
which works cited in the articles were germane to 
the research being reported. 

Each of us independently read and evaluated 
every study. We then met and compared our 
analyses. We do not report agreement of inde- 
pendent scoring because, although we had dis- 
agreements, the great majority were either clear 
oversights by one of us or easily resolved. It would 
have been desirable to compare our analysis with 
the findings of a second set of evaluators, but this 
was not feasible. 

Results of Analysis 

In the remainder of this section, we present the 
results of our analysis, using the categories from 
Figure 1 to organize our remarks. Both descriptive 
summaries of our findings and our interpretation of 
them are reported, along with examples of both 
good and bad practice. 1 We offer our observations 
on what these studies suggest about social studies 
research in Chapter 2. 

Type of Research. The breakdown by type of 
research was shown earlier in Table 1. As can be 
seen, experimental and survey research 
predominate. Th*s finding is in line with what other 
reviews have indicated (e.g., see Armento U86, 
Stanley 1985). Of interest, howeve-, is the 
preponderance of quasi-experiments in SE (some 
42 percent of those reviewed); the rather large num- 
ber of correlational studies in TRSE and the JSSR 
(almost 19 percent of the total number reviewed in 
TRSE and 16 percent in the JSSR); the equally 
large number of ethnographic studies in TRSE (19 
percent of those reviewed), and the high percent- 
age of questionnaire-type surveys in the JSSR (37 
percent of those reviewed). One type of research 
methodology was particularly noticeable by its 



omission -ex post facto research. 2 We found not 
on b example of this type of research published in 
any of the three journals during the period of this 
review. 

It is worth noting that of the total number of ar- 
ticles published in TRSE during this period, 47 (35 
percent) were arguments of one sort or another. 
This seems to be an unduly large proportion of the 
total number of articles published in this journal. 
The percentage of articles that were arguments 
was much lower in the other two journals. 

Justification of Study. To what ?xtent were 
these studies justified -that is, to what extent did 
the authors attempt to defend the worthwhileness 
of their research? A justification was considered to 
be any attempt by the authors either to argue ex- 
plicitly why they thought their study was worth 
doing or clearly to imply its worth through their 
remarks. 

The great majority of researchers did make an 
explicit argument for the worth of their research 
and did not simply take it for granted. Only five 
studies, in all three journals, did not contain some 
form of argument about the worth of the intended 
research. With regard to the ethics of these 
studies, in only one (out of 118) did we find cause 
for concern. This did not involve potential harm to 
the subjects, however, but rather what we con- 
sidered to be inappropriate value judgments per- 
taining to another culture. The results in this 
category are shown in Table 2. 

Clarity. The clarity of these studies received a 
mixed review (Table 3). We were pleasantly 
surprised to find that the focus -the overall Intent - 
of every study was clear We had no trouble what- 
soever discovering what the authors Intended to in- 
vestigate. The clarity of the particular variables 
being investigated, however, was not always made* 
clear. To be sure, in the great majority of studies in 
all three journals, the variables were made clear at 
the start. In seven of the studies in TRSE, however, 
it took us some time to be sure about the nature of 
the variables involved; in another eight, we never 
couid discern what the variables were. 

Of the eight studies in which the variables were 
unclear, tive were ethnographies. Since one of the 
claims made for ethnographic research is the 
elucidation of meaningful variables, this falling 
seems rather serious The authors of the remaining 
four ethnographies published in TRSE did succeed 
in making their variables clear, however. We recog- 
nize that one purpose of ethnographic research is 
sometimes said to be the presentation of ways in 
which differing groups give meaning to their exist- 
ence -that is, their perceptions of the world. We 
did not. however, detect this intent in any of the 
ethnographies that we reviewed. 



9 

ERLC 







TABLE 2 






ncSuuTS: JuSTiFiCAl iON Oh KfcSEARCH 


Justification of Research 


TRSE 


JSSR 


SE 


Total 


No mention of justification 


2 (4%) 


3 (6%) 


0 (0%) 


5 (4%) 


Explicit argument made 


35 (76%) 


44 (80%) 


17 (100%) 


96 (82%) 


Implicit argument fouid 


9 (20%) 


8 (14%) 


0 (0%) 


17 (14%) 


Totals 


46 


55 


17 


118 


Ethical concerns 


0 (0%) 


1 (2%) 


0 (0%) 


1 (1%) 



TABLE 3 
RESULTS: CLARITY 



Clarity of Studies 


TRSE 


ooon 


Ok. 


i oiai 


Focus clear 










Yes 
No 

Questionable 


46 (100%) 
0 (0%) 
0 (0%) 


52 (94%) 

1 (2%) 

2 (4%) 


17 (100%) 
0 (0%) 
0 (0%) 


M5 (97%) 

1 (1%) 

2 (2%) 


Totals 


46 


55 


17 


118 (100%) 


Variables 










Clear initially 
Clear eventually 
Never clear 


31 (67%) 

7 (15%) 

8 (17%) 


51 (93%) 
3 (5%) 
1 (2%) 


17 (100%) 
0 (0%) 
0 (0%) 


99 (84%) 
10 (3%) 
9 (8%) 


Totals 


46 


55 


17 


118 (100%) 


Treatment in intervention studies made explicit 








Yes 
No 

Not applicable 


12 (26%) 
2 (4%) 
32 (70%) 


17 (31%) 
5 (9%) 
33 (60%) 


12 (71%) 
1 (6%) 
4 (23%) 


41 (35%) 
8 (7%) 
69 (58%) 


Totals 


46 


55 


17 


118 (100%) 


Hypothesis present 










No 

Yes, explicit 
Yes, implied 


18 (39%) 
1 3 (28%) 
15 (33%) 


24 (44%) 
11 (20%) 
20 (36%) 


4 (24%) 
,5 (29%) 
8 (47%) 


46 (39%) 
29 (25%) 
43 (36%) 


Totals 


46 


55 


17 


118 (100%) 
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Of the studies published in the JSSR, the vari 
ables were clear In all but one (98 percent) The 
variables were dear in all of the articles published 
In the research section of SE (100 percent). 
Generally, too, in those studies involving an inter- 
vention of some sort, the treatment was made ex 
plicii, although there were a few in each journal in 
which we could not be sure as to what the treat- 
ment actually Involved. 

Twenty-eight of the 46 studies (61 percent) in 
TRSE, 31 of 55 (56 percent) of those in the JSSR, 
and 13 of 17 (76 pe.cent) in SE were hypothesis- 
testing Investigations. In over half of these in all 
three journals, however, the hypothesis was im- 
plied (e.g., In the rationale for the study) rather 
than being stated explicitly. 

Definitions. Definition of key terms by the 
authors of these studies also drew a mixed review 
(Table 4). Almost 30 percent of the studies in TRSE 
and the JSSR lacked any definition of the terms in- 
volved; the figure Is over 40 percent for SE. inter- 
estingly, a disproportionate number (16 of 35) of 
these studies were either true or quasi-experi- 
ments. This was especially true in TRSE (7 of 13) 
and In SE (7 of 7!) It may be that since these 
studies tended to be on more traditional topics, 
using technical !e-ms frequently found in the re- 
search literature, the authors assumed that hese 
terms would be understood by the readership. This 
assumption may be questionable, however, and 
needs to be considered carefully. 

Exactly 50 percent of the studies in TRSE (23 of 
46) utilized either operational or constitutive defini- 
tions of terms (or both), compared to 40 percent for 



the JSSR and 18 percent for SE. The extent to 
which the meaning of the terms involved eventually 
became clear within the context of the study varied 
across journals, from 35 percent to 47 percent Al- 
most all of the TRSE studies having clear-in-con- 
text definitions occurred in the first half of the 
studies chronologically, whereas most of the 
stuaies that lacked definitions (10 of 13) occurred 
in the more recent 23 studies, allowing us to con- 
clude that, overall, the adequacy of definitions 
decreased during this time period in this journal. 
This trend was not evident in the other two journals. 

Sample. Only seven studies (out of a total of 
118) had truly random samples (i.e., an initial ran- 
dom selection from a defined population). Three of 
these populations were so narrow as to be of 
dubious interest, however. They were (1) enrollees 
in teacher education at a particular university, (2) 
students from two high schools in the midwest, 
and (3) students from two high schools in the 
southeast. Three were surveys involving question- 
naires, but the number of returnees totaled only 76 
perc^ii, 57 percent, and 80 percent, thus making 
the accepting sample no longer random. Finally, 
one involved a cluster sample with an "n" of only 
six ciassroc 3, although these were randomly 
selected. The great majority were convenience 
samples, which, given the difficulties Involved in 
doing research In the public schools, may (usually) 
be about the best one can expect. Table 5 shows 
the breakdown by type of sample. 

The description of the sample often left a great 
deal to be desired (Table 6). Many times we were 
not clear about the characteristics of the subjects 
involved in a study 





TABLE 4 
RESULTS: DEFINITIONS 




Definitions 


TRSE 


JSSR 


SE 


Total 


No definitions 
Operational definitions 
Constitutive definitions 
Definitions clear in context 


13 (28%) 
10 (22%) 
13 (2C%) 
16(35%) 


15(27%) 
6(11%) 
16 (29%) 
25 (45%) 


7 (41%) 

1 (6%) 

2 (12%) 

8 (47%) 


35 (30%) 
17 (14%) 
31 (26%) 
49 (42%) 


Totals 


52 a 


62 a 


18 a 


132 a 


a Totals do not equal the actual number of studies reviewed, bince several studies used both operation^ 1 
and constitutive definitions. The percentages represent percentage of the total of actual studies reviewed 
In which the particular type of definition (or absence of definitions) could be found. 
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Type of Sample 


TRSE 


JSSR 


SE 


Total 


Total population 


0 (0%) 


1 (2%) 


0 (0%) 


1 (1%) 


Random selection 


2 (4%) 


4 (7%) 


1 (5%) 


7 (6%) 


Representation based on 










argument 


6 (13%) 


5 (8%) 


2 (11%) 


13 (11%) 


Convenience 


29 (62%) 


42 (70%) 


16 (84%) 


87 (74%) 


Volunteer 


4 (9%) 


3 (5%) 


0 (0%) 


7 (6%) 


Can't tell 


6 (13%) 


5 (8%) 


0 (0%) 


11 (9%) 


Total 


47 a 


60 a 


19 a 


126 a 



°Eight studies Uoed more than one type of sample Percentages represent percentage of the total of ac- 
tual studies reviewed in which the particular type of sample was used 
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RESULTS: ADEQUACY Oh" SAMPLE DESCRIPTION 


Adequacy of Sample 
Demographics 


TRSE 


JSSR 


SE 


Total 


Adequate sample 
demographics given 


8 (17%) 


1 (2%) 


0 (0%) 


9 (7%) 


Some sample 
demographics given 


29 (63%) 


22 (40%) 


6 (35%'' 


57 (48%) 


No sample demographics 
given 


9 (20%) 


32 (58%) 


11 (65%) 


52 (45%) 


Totals 


46 


55 


17 


118 (100%) 



The adequacy of sample descriptions is an issue 
that is insufficiently discussed in the research litera- 
ture. Is there any agreement that certain 
demographics, such as gender, age, socioeco- 
nomic status, or geographic area, for example, 
should always be reported? We know of no consen- 
sus on this question. Further, descriptive informa- 
tion must surely depend on the nature of the study 
Perhaps authors should be required to report 
evidence that their sample is similar to a defined 
target population on variables they consider impor- 
tant. Perhaps It is unrealistic to expect satisfactory 



description. If so, another argument is raised in 
favor of replication; similar results obtained in 
several samples is an impressive argument for 
generalizability. Four studies did report some form 
of replication on the same topic -effects of teacher 
enthusiasm (three in TRSE and one In SE). Another 
four studies reported partial replications of a par- 
ticular method of sequencing examples and non-ex- 
amples in concept attainment. Interestingly, seven 
out of eight of these replicated studies Involved a 
common investigator. 
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The lack of randomness in selecting samples 
and inadequate sample description raise serious 
questions about the generalizability of almost a!! 
the studies we reviewed; we shall discuss this point 
In more detail when we consider external validity. 

The sample sizes in these studies varied tremen- 
dously, ranging from an n of one in an eth- 
nographic study, to n's of 589 in an experimental 
study, 1800 in a correlational study, and 4150 in a 
questionnaire-type survey. The range of sample 
size by type of study is shown in Table 7. 

Internal Validity. We were interested in how 
often alternative hypotheses could be suggested to 
explain positive findings. Accordingly, we examined 
each study to see the extent to which one or more 
threats to internal validity, originally identified by 
Campbell and Stanley (1963) and Cook and 
Campbell (1979), might have been present. Often, 
they were. 

We acknowledge that this catalog of threats was 
originally developed to apply to experimental or 
group-comparison studies. As such, some of them 
make little sense when applied to correlational, 
questionnaire, Interview, or ethnographic icsearch 
(pretest, maturation, regression, and order effects, 
in particular). However, the remaining categories 
are useful with respect to all methodologies 
wherein a researcher is attempting to explore 
relationships and even (on occasion) when simple 
description is the aim. We believe the examples dis- 
cussed below will document this point. 

The most frequent threats were subject charac- 
teristics (other characteristics of the subjects may 
have accounted for the results), mortality (some of 
the subjects dropped out of one or more com- 



parison groups in actual or probable unequal 
amounts), a Hawthorne or John Henry effect (the 
subjects in the experimental or control groups 
knew they were part of an experiment of some 
sort), and, especially in the ethnographic studies, a 
researcher effect (the researcher may have acted 
so as to bias the responses of the subjects in 
some way) Furthermore, when these threats ex- 
isted, the researchers oftentimes did not seem to 
be aware of them, or at least they failed to discuss 
their implications (this tended to improve some- 
what in the more recent studies). 

Table 8 shows the number of studies of each 
type in which we identified threats and (sub- 
sequently) the number where we judged them to 
be adequately discussed. Surprisingly, 14 of the 22 
true experiments contained one or more threats. 
These included actual or probable Inequality of 
groups despite random assignment (n = 7); lack of 
actual control over the treatment (n = n); a pos- 
sible Hawthorne or John Henry effect (n = 5); mor- 
tality (n = 4); and an instrumentation effect (n = 5). 
About half of the studies contained discussions of 
these threats. We were surprised that only one of 
the ethnographic reports acknowledged the pos- 
sibility of a researcher effect, perhaps because it Is 
thought to be an intrinsic limitation. 

That we identified fewer threats, proportionately, 
for survey studies is not surprising, in that most of 
these studies attempted essentially to describe vari- 
ables rather than to identify relationships. The other 
finding of possible importance is that the authors 
of studies in SE appeared to do a somewhat 
poorer job of discussing threats, possibly a reflec- 
tion of the shorter length of these reports. 



TABLE 7 

RANGE OF SAMPLE SIZE BY TYPE OF STUDY 



Type of Study 


TRSE 


JSSR 




SE 




Range 


(Med.) 


Range 


(Med.) 


Range 


(Med.) 


Pre-experiments 


0 




16-35 


(31) 


29 


( a ) 


True experiments 


42-589 


(211) 


24-282 


(122) 


18-360 


(55) 


Quasi-experiments 


49-925 


(200) 


35-563 


(74) 


38-426 


(164) 


Correlational studies 


33-1050 


(498) 


26-1800 


(163) 


0 


Surveys 


25-554 


(234) 


16-2097 


(93) 
( a ) 


42-4150 


( a ) 


Interviews 


7-70 


(27) 


26 


16 


( 3 ) 


Causal-comparisons 


0 




120 


( 3 ) 


0 


Ethnographies 


1-138 


(12) 


3-26 


(16) 


0 





Medians are not reported, since they have little meaning when there are only one or two studies. 
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TABLE 8 

RESULTS: THREATS TO INTERNAL VALIDITY 



A. Total Number of Threats to Internal Validity Identified in Each of Three Journals 3 




TRSE 


JSSR 


SE 


Total" 


History 


4 


4 


6 


14 (12%) 


Maturation 


0 


1 


0 


1 (1%) 


Mortality 


10 


7 


4 


21 (18%) 


Subject characteristics 


15 


31 


8 


54 (46%) 


Pretest effect 


2 


6 


1 


9 (8%) 


Regression effect 


0 


1 


0 


1 (1%) 


Instrumentation 


21 


23 


3 


47 (40%) 


Hawthorne/John Henry effect 


7 


7 


10 


24 (20%) 


Older effect 


0 


1 


1 


2 (2%) 



B. Types of Studies in Which Threats Were Identified and Discussed: TRSE 

Type No. b Threats Identified 0 Threats Discussed d 



Pre-experiments 


0 (0%) 


0 (0%) 


0 (0%) 


True experiments 


7 (15%) 


3 (43%) 


2 (2£%) 


Quasi-experiments 


7 (15%) 


7 (100%) 


4 (57%) 


Correlst'onal studies 


9 (19%) 


5 (56%) 


3 (33%) 


Surveys 


9 (19%) 


3 (33%) 


0 (0%) 


Interviews 


6 (13%) 


4 (67%) 


1 (17%) 


Causal-comparisons 


0 (0%) 


0 (0%) 


0 (0%) 


Ethnographies 


9 (19%) 


9 (100%) 


0 (0%) 




47 e 







C. Types of Studies in Which Threats Were Identified and Discussed: JSSE 




Type 


No b 


Threats Identified 0 


Threats Discussed d 


Pre-experiments 


4 (6%) 


3 (100%) 


1 (33%) 


True experiments 


11 (17%) 


8 (73%) 


2 (18%) 


Quasi-experiments 


9 (14%) 


9 (100%) 


4 (44%) 


Correlational studies 


10 (16%) 


10 (100%) 


5 (50%) 


Surveys 


23 (37%) 


11 (48%) 


4 (17%) 


Interviews 


1 ( 2%) 


1 (100%) 


0 (0%) 


Causal-comparisons 


3 (5%) 


3 (100%) 


2 (67%) 


Ethnographies 


2 (3%) 


2 (100%) 


0 (0%) 


63 f 
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Table 8 (continued) 

D, Types of Studies in Which Threats Were Identified and Discussed: SE 

Type No, b Threats Identified 0 Threats Discussed d 



Pre-experiments 


2 (11%) 


1 (100%) 


0 (0%) 


True experiments 


5 (26%) 


4 (80%) 


1 (20%) 


Quasi-experiments 


8 (42%) 


8 (100%) 


1 (13%) 


Correlational studies 


0 (0%) 


0 (0%) 


0 (0%) 


Surveys 


3 (16%) 


0 (0%) 


0 (0%) 


Interviews 


1 (5%) 


0 (0%) 


0 (0%) 


Causal-comparisons 


0 (0%) 


0 (0%) 


0 (0%) 


Ethnographies 


0 (0%) 


0 (0%) 


0 (0%) 


19 9 







*Some studies contained several threats. 

Percentages represent the percentage of the total of actual studies reviewed within each category. 

°The numbers and percentages here refer to studies in which threats were identified by us. 

d The numbers and percentages here refer to a discussion by the authors of a study of the threats. 

e One study used more than one methodology. 

f Eight studies used two methodologies. 

9 One study used two methodologies. 



We offer the following illustrations of how threats 
to Internal validity may appear in other than com- 
parison group studies. Whenever two or more in- 
struments are used in a study, with both designed 
to investigate a particular relationship, an in- 
strumentation threat may develop. There is some- 
times a strong likelihood that at least some respon- 
dents will figure out the hypothesis and alter their 
responses accordingly, sometimes in ways making 
support for the hypothesis more likely. We viewed 
this as a problem in studies correlating (1) student 
self-concept, attitudes toward social studies, and 
perceptions of teacher and classroom, and (2) 
teacher attitudes toward teaching, self-concept, 
and acceptance of responsibility for student 
achievement. 

An instrumentation effect may also occur due to 
the way instruments are administered and/or 
scored. In one study, for example, the report was 
such as to raise questions about the independence 



of scoring of the two instruments used to test the 
hypothesis, in another, the same administrator 
gave both tests to individual children, one following 
the other. In both studies, the instruments them- 
selves were vulnerable to variations In administra- 
tion and scoring. 

The selection of subjects can create bias if the 
nature of the sample is atypical. This Is closely re- 
lated to the problem of generalizing, but is an addi- 
tional problem in studies where it appears likely 
that the way subjects were obtained favors support 
for hypotheses We judged this to be a problem in 
studies that (1) reported correlations between 
teacher responsibility for student achievement and 
various other attitudes in a sample of volunteers for 
a workshop in mastery learning, (2) reported 
relationships between student out-of-school ex- 
periences and attitude toward social studies in 
schools described as "good" in terms of environ- 
mental opportunities and quality of teachers, (3) 
reported correlations between a cloze reading test 
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and a test of text comprehension with a group of 
low socioeconomic level students whose teachers 
had low expectations for them, and (4) reported 
correlates between general concept attainment 
and understanding of social concepts with a group 
of primary chiidren in a university lab school 

We also identified a possible subject selection 
threat in several survey studies, including (1) dif- 
ferences In male and female interest in social 
science disciplines using a sample from one area 
in the South, (2) opinions regarding effects of 
policies on research with human subjects, based 
on responses from a volunteer sample of "inter 
ested" faculty members, (3) teacher perceptions as 
to the nature of discipline problems in one school 
in a low income neighborhood, where discipline 
was considered a major problem, and (4) social ac- 
tion activities of social educators using a sample of 
volunteer respondents. 



A positive sign with regard to internal validity 
was that it was generally quite clear in the Interven- 
tion studies that the treatment was Implemented as 
intended. We found only nine studio (out of a total 
of 50) in which this was not clear. Table 9 presents 
our impressions related to whether a treatment real- 
ly did occur. 

Whether the length of time of the treatment was 
sufficient to produce the intended effects proved to 
be a very difficult judgment to make, but we 
judged the time to be sufficient In only 60 percent 
of the studies (Table 10). Sizable differences 
among the journals appeared. In only four of the 
14 intervention studies reported in TRSE did it 
seem that the prescribed treatment was clearly 
long enough to give the hypothesized effects an 
adequate chance to manifest themselves. This 
problem was less frequent in the other two jour- 
nals, at least partly because the interventions them- 
selves were often less "ambitious.*" 







TABLE 9 








RESULTS: TREATMENT 




Was It Clear That a 










Treatment Occurred? 


TRSE 


JSSR 


SE 


Total 


Yes 


12 (26%) 


17 (31%) 


12 (71%) 


41 (35%) 


Questionable 


2 (4%) 


6 (11%) 


1 (6%) 


9 (8%) 


Not applicable 


32 (70%) 


32 (58%) 


4 (23%) 


68 (57%) 


Totals 


46 


55 


17 


118 (100%) 







TABLE 10 








RESULTS: LENGTH OF TREATMENT 




Was Length of Time of 










Treatment Sufficient? 


TRSE 


JSSR 


SE 


Total 


Yes 


4 (9%) 


19 ^35%) 


9 (53%) 


32 (27%) 


Questionable 


9 (19%) 


4 (7%) 


4 (24%) 


17 (14%) 


Can't tell 


1 (2%) 


0 (0%) 


0 (0%) 


1 (1%) 


Not applicable 


32 (70%) 


32 (58%) 


4 (24%) 


68 (58%) 


Totals 


46 


55 


17 


118 
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Instrumentation. In this category, we were con- 
cerned with the extent to which researchers ascer- 
tained the reliability and validity of the instrument(s) 
they used. We looked to see if authors made some 
sort of reliability and/or validity check and, in the 
case of reliability, whether the reliability reported 
was adequate for the type of study being con- 
ducted. Those studies for which the answer to this 
query was "no" or "questionable" reported indexes 
below the rather lenient standard of .70. Here, as in 
other categories, results were not homogenous. 

It Is somewhat sobering to note that in all three 
journals, more than half of the studies did not 
mane any reliability check whatsoever. This was 
the case with 25 of the 46 studies reviewed in 
TRSE; 29 of the 55 in the JSSR; and 10 of the 17 in 
SE. We judged reliability to be adequate in only 27 
percent of these studies, Including three studies 
reporting only scorer or observer agreement. We 
could find only four In which the researchers check- 
ed the stability of scores over time, probably a 
more important Issue than internal consistency, 
only one of these reported the time interval in- 
volved. 

Our findings with regard to validity were even 
more depressing A startling 32 studies (out of 46) 
in TRSE; 46 studies (out of 55) in the JSSR, and 13 
studies (out of 17) in SE made no attempt to check 
Instrument validity! Of the 27 which did, only 12 
presented evidence other than judgments. A more 



detailed breakdown on these data for all three jour- 
nals is shown in Tables 11 and 12. 

Externa! Validity. External validity, of course, 
refers to the degree to which the results of a study 
are generalizable This category was another in 
which the studies reviewed were distressingly defi- 
cient. Both population and ecological generaliza- 
bility were considered in this category. Population 
generalizability refers to an explicit extension of the 
findings of the study to one or more target popula- 
tions (i.e., other subjects). Ecological generaliz- 
ability refers to an explicit reference to another set- 
ting of some sort (i.e., subject matter, materials, 
physical conditions, personnel, etc.). 

In 22 instances in TRSE, the researchers general- 
ized to indef3nsible target populations, although 
the authors did caution about generalizing inap- 
propriately in another 13 studies. There was no 
mention of population generalizability In eight 
studies. Inappropriate generalizing occurred in 33 
studies in the JSSR, and in 12 In SE. In both these 
journals, the frequency of entries in the "no men- 
tion of generalizability" category was considerably 
higher than those under "explicit reference to in- 
defensible population." While this is clearly 
preferable, our experience indicates that to most 
readers, failure to discuss generalization leads to 
the erroneous inference that findings can be 
generalized without serious reservation. 



RESULTS: RELIABILITY OF INSTRUMENTS 



Reliability 


TRSE 


JSSR 


SE 


Total 


Empirical check made? 
No 
Yes 


25 (54%) 
21 (46%) 


29 (53%) 
26 (47%) 


10 (59%) 
7 (41%) 


64 (54%) 
54 (46%) 


Totals 


46 


55 


17 


118 (100%) 


If yes, adequate for study? 
Yes 

Questionable 
No 

Can't tell 
Not applicable 


12 (26%) 
4 (9%) 
4 (9%) 
1 (2%) 

25 (54%) 


16 (29%) 
7 (13%) 
3 (5%) 
0 (0%) 
29 (53%) 


4 (24%) 

1 (6%) 

2 (12%) 
0 (0%) 

10 (58%) 


32 (27%) 
12 (10%) 
9 (8%) 
1 (1%) 
64 (54%) 


Totals 


46 


55 


17 


118 (100%) 
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TABLE 12 

RESULTS: VALIDITY OF INSTRUMENTS 

Validity TRSE JSSR SE Total 

Empirical check made? 

NO 32 (70%) 46 (84%) 13(76%) 91 (77%) 

Yes 14(30%) 9(16%) 4 (24%) 27 (23%) 

Totals 46 55 17 118 (100%) 

if yes, type: 



Content (logical) 


3 (21%) 


0 (0%) 


0 (0%) 


3 (11%) 


Judge-supported 


2 (14%) 


8 (89%) 


4 (100%) 


14 (52%) 


Concurrent 


5 (36%) 


0 (0%) 


0 (0%) 


5 (19%) 


Predictive 


0 (0%) 


0 (0%) 


0 (0%) 


0 (0%) 


Construct 


3 (21%) 


0 (0%) 


0 (0%) 


3 (11%) 


Other (including factor 
analysis) 


3 (21%) 


1 (11%) 


0 (0%) 


4 (15%) 


Totals 


16 a 


9 


4 


29 



a Two studies used two checks. Percentage represents percentage of studies in which instrument validity 
was checked. 



There was no mention of ecological 
generaiizabiiity in 31 studies in TRSE, 44 in the 
JSSR, and 12 in SE, leading us to conclude that 
this, perhaps, is not something that these re- 
searchers generally considered. When they did, 
however, they were quite a bit more careful, with 
only six studies in TRSE, none in JSSR, and two in 
SE containing an explicit reference to an indefen- 
sible setting. We believe that the over, 'I failure to 
discuss the ecological generaiizabiiity of a study, 
however, has the effect of suggesting that such 
generalizing is warranted. The breakdown in the 
three journals is shown in Table 13. 

Distinction Betweei i Results and Con- 
clusions. Did the author?* of these studies maintain 
a distinction between their findings (I.e., what they 
observed or obtained) and their interpretations 
(i.e., the conclusions they drew based on the na- 
ture of their findings)? Overwhelmingly, they did. 
Seventy-four percent of the studies in TRSE, 94 per- 
cent of those in the JSSR, and 82 percent of those 
in SE maintained a sharp distinction between 
results and interpretations. This is shown in Table 
14. 



The major exception was the ethnographic 
studies, which account for nine of the fifteen 
"no's." Although' this is a widely known and, to 
some extent, unavoidable limitation of this type of 
study, we feei the authors of these studies could 
have done a much better job of making clear the 
basis for their interpretations. Failure to do so 
provides ammunition for those who allege that eth- 
nographic research is iittie more than subjective im- 
pressionism. 

Data Analysis, in almost all of the studies, the 
authors used some form of descriptive or inferen- 
tial statistics. Did they use the correct procedure? 
Generally, yes! The five "no's" for descriptive statis- 
tics in TRSE and the two in the JSSR reflect our 
opinion that additional descriptive procedures (e.g., 
frequency of response) would have greatly clarified 
the findings. Three of the five "no's" in TRSE were 
ethnographies. 

Were the interpretations of these researchers ap- 
propriate given the nature of their studies? Here, 
the answer generally is "yes" when descriptive 
statistics were involved, but overwhelmingly "no" 
when inferential statistics were reported. The 
major error was the inappropriate use of inferential 




TABLE 13 
RESULTS: EXTERNAL VALIDITY 





TRSE 


JSSR 


SE 


Total 


Discussion of Population 
Generalizability 












Appropriate: 

Explicit reference to 
defensible target 
population 

Appropriate caution 
expressed 


2 (4%) 
13 (28 °M 


5 (9%) 

1fi 


2 (12%) 

O \ I / /o) 


9 

32 


(8%) 
(27%) 


Inappropriate: 

No mention of 
generalizability 

Explicit reference to 
indefensible target 
population 


9 (20%) 
22 (48%) 


23 (42%) 
11 (20%) 


7 (41%) 
5 (30%) 


39 

oo 

oo 


(33%) 
(32%) 


Totals 


46 


55 


17 


118 




Discussion of Ecological 
Generalizability 












Appropriate: 

Explicit reference to 
defensible settings 

Appropriate caution 
expressed 


2 (4°/M 

c. yt /Of 

7 (15%) 


1 10°/ \ 
1 \£ /oj 

8 (15%) 


n /no/ \ 
3 (18%) 


3 
18 


(2%) 
(15%) 


Inappropriate: 

t\\ r\ montinn r»f 
INU 1 1 Icl IUUi l 01 

generalizability 

Explicit reference to 
indefensible settings 

Not applicable 


31 (67%) 

6 (13%) 
0 (0%) 


44 (80%) 

0 (0%) 
2 (3%) 


12 (71%) 

2 (12%) 
0 (0%) 


87 

8 
2 


(74%) 

(7%) 
(2%) 


Totals 


46 


55 


17 


118 
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TABLE 14 

RESULTS: DiSTiNCTiON BETWEEN RESULTS AND CONCLUSIONS 



Distinction Observed 
Between Results and 



Conclusions? 


TRSE 


JSSR 


SE 


Total 


Yes 


34 (74%) 


51 (93%) 


14 (82%) 


99 (84%) 


No 


12 (26%) 


3 (5%) 


1 (6%) 


16 (14%) 


Questionable 


0 (0%) 


1 (2%) 


2 (12%) 


3 (2%) 


Totals 


46 


55 


17 


118 



procedures to test the significance of obtained 
results in studies where the obtained sample was 
not random. A significance test is appropriate only 
when a researcher is assured that he or she has a 
random sample, and this was literally the case in 
only four studies. (See the discussion under 
Sample.) In 13 other studies, the authors argued 
for representativeness and hence (by implication) 
for significance tests; we found only two of these 
persuasive. Some researchers advocate the report- 
ing of significance tests as an indication of impor- 
tant differences but with appropriate qualifications. 
The reporting of effect sizes, however, we think 
would be more informative. Effect size was 
reported in only one study. 

With regard to other forms of inferential 
misinterpretation, the author of one study in TRSE 
made much of the relative contribution of different 
variables to a multiple correlation, even after ex- 
plicitly discussing the likelihood of chance fluctua- 
tions due to the small n (22). The authors of 
another, otherwise commendable, study committed 
the error of treating non-significant differences as 
though the null hypothesis were proven. In fact, the 
differences between the highest group and each of 
the two lowest groups were such as to yield effect 
sizes of approximately .4 to 1.0, depending on 
which standard deviation was used. This mistake 
also appeared in three other studies. 

We found several studies in all three journals in 
which the authors apparently confused random as- 
signment with ran lorn selection. Random assign- 
ment is a powerful, inough imperfect, technique for 
equating groups. Further, it permits comparison of 
variance between groups with variance within 
groups, it does not, however, justify the calculation 
of significance tests, because generalization is a 
separate issue from both the equating of groups 
and assessing the magnitude of differences. When 
reporting a significant difference between two 
groups equated by random assignment, the ques- 



tion is. "To what population may this difference be 
generalized?" In the absence of random sampling, 
or of a persuasive argument for representativeness, 
and particularly In the case of convenience 
samples, which were used In virtually all of these 
studies, the answet must be: "No one knows!" 
Therefore, the Information presumed In the finding 
of significance is, at best, only somewhat Informa- 
tive and, at worst, misleading unless carefully 
clarified by the authors, a practice glaringly absent 
from these reports, probably because it is virtually 
impossible to do. 

In 28 studies, the interpretation of the descrip- 
tive statistics used was highly questionable. Nine of 
these were variations of correlation studies. Two of 
these combined scores of students with scores of 
their teachers In obtaining first-order correlations In 
multiple correlation studies, a highly suspect prac- 
tice (particularly with a teacher n of eight In one 
study). In one of these two, it appears that data on 
teachers and students were simply combined; In 
the other, the best we can deduce Is that the 
teacher's scores were assigned to each of his or 
her students. The author of the latter study also 
concluded that the obtained results provided 
limited support for the position that teachers 
should be encouraged to focus their Instruction 
around objectives. This conclusion was based on 
the finding that teacher use of objectives con- 
tributed one percent (one percent!) to the 
predicted variance of student achievement 
(whereas the CAT and pretest combined con- 
tributed 42 percent)! In both studies, the unneces- 
sary complexity of analysis and reported data vir- 
tually preclude the reader from determining what 
the findings really were. 

In six other studies, too much was made of cor- 
relations below .40. While a case may sometimes 
be made for the importance of correlations of this 
magnitude in testing theory or in unusual practical 
applications (e.g., prediction with a very small 
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selection ratio), one can hardly pay serious atten- 
tion to correlations of this size when the variables 
are "historical understanding" and "information 
processing capacity" (r = .14); "economic 
knowledge" and "attitude toward the American 
economic system" (r = .28); "positive interracial con- 
tact" and "satisfaction with university life" among 
black females (r = .22); and IQ vs. close-minded- 
ness and self-esteem (r=-.24; .29), even though 
statistically significant due to large n's. Another 
study states that "some modest sctiool effects 
were found for political interest, political alienation, 
and anti-Vietnam war attitudes." The multiple cor- 
relations based on five school variables plus IQ 
and socioeconomic level were, respectively, r = .39; 
r = .16; and r = . 41 -modest indeed, especially 
since the particular schooling variables were 
weighted differently for each attitude. Granted, the 
low reliability of instruments used may limit the de- 
gree of correlation possible; this is another reason 
for reporting reliabilities. In the absence of such 
data, however, one cannot assume that correla- 
tions would be higher with more reliable Instru- 
ments. One study illustrates this point in reverse. 
The author dismisses a correlation of .53 because 
It is reduced to .32 when a subgroup of restricted 
range is analyzed. This, when the respective 
reliabilities ot his two instruments are maximally .64 
and .96! 4 

We found 12 studies using the group com- 
parison model which contained highly questionable 
Interpretations, in one quasi-experimental study, 
the authors concluded, on the basis of non-sig- 
nificant t tests (n = 49 In each group), that there 
was a "lack of major effects on the attitude of 
MACOS students," while admitting that the MACOS 
group became slightly more tolerant of repugnant 
activities than did the non-MACOS group. Examina- 
tion of the change in total test score means, 
however, shows that the MACOS group showed a 
change of -2.94 compared to -.48 for the com- 
parison group. Estimation of the standard deviation 
of change scores for the comparison group sug- 
gests an effect size of .6 to .7, an impressive dif- 
ference even though not statistically significant. 
The authors of another, otherwise well-done, ex- 
perimental study concluded that one of four teach- 
ing strategies was the most useful and devoted 
considerable space to discussing why this might 
be so. This, despite the finding that this was the 
poorest of the four methods *' one of their four in- 
teraction subgroups (female h jor readers), while 
another method was appreciably better. The 
authors of this study also committed the error of as- 
suming that non-significant differences on pretests 
is tantamount to groups being equal. Regressed 
gain scores should have been used, since pretests 



were given expressly to check on the efficacy of 
random assignment in equating groups. 

One of two hypotheses tested in a quasi-ex- 
perimental study was that regular value analysis dis- 
cussions would increase students' social trust, so- 
cial integration, political confidence, and political in- 
terest, as compared to reading-only and control 
groups. Under the results section of the study, the 
authors concluded that "there is some evidence to 
support the hypothesis." They go on to state that 
while the value analysis group did score significant- 
ly better statistically than the reading-only group, 
the difference between the two groups was mini- 
mal, in addition, the control group scored sig- 
nificantly higher than did the reading-only group on 
two of the measures. They then concluded that the, 
results offered only modest and mixed support for 
the hypothesis. In actuality, the adjusted means for 
the value analysis and control groups were very 
similar. The only meaningful finding is the lower 
scores for the reading-only group. The authors 
provide plausible interpretations as to why this 
group may have scored lower while the control 
group scored high, but such ex post facto specula- 
tion cannot obviate the finding that there was no 
support in the data for the hypothesis. In another 
study, a low correlation between pre and post 
scores on an attitude scale (single group) was Inter- 
preted as indicating true change after eight weeks 
of summer school, rather than the more probable 
low reliability of the scale. 

In most of the surveys where we questioned the 
interpretation, the reason was lack of supporting 
data. In one case, however, the author simple ig- 
nored data that was presented. By combining the 
categories "agree" and "slightly agree," the inter- 
pretation of differences in attitude toward different 
social studies traditions was, in fact, obscured. 

For the most part, the errors described above 
appear to support the opinion, increasingly voiced 
(e.g., sea Carver 1978; Shaver and Norton 1980), 
that inferential statistics play too important a role !n 
current research efforts. Not only are they, with 
rare exception, mathematically or logically indefen- 
sible, but they also can obscure the the real find- 
ings of a study. Perhaps It is time for the profes- 
sion to consider using descriptive statistics more 
meaningfully, rather than continuing to foster the 
use of elegant but inappropriate inference tests. 

The breakdown with regard to the analysis of 
data in these studies is shown in Table 15. 

Legitimacy of Conclusions. Were the con- 
clusions reached by the authors of these studies 
justified? This, perhaps, is the most important ques- 
tion addressed in this review. In attempting to 
answer it, we decided to focus on the extent to 
which the conclusions drawn by the authors seem 
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TABLE 15 

HL&ULIO: UMIMMNMLYdid 




Descriptive Statistics 


TRSE 


JSSR 


SE 


Total 


Use correct? 
Yes 
No 

Questionable 
N/A a 


34 (74%) 

5 (11%) 
1 (2%) 

6 (13%) 


51 (92%) 
2 (4%) 
2 (4%) 
0 (0%) 


17 (100%) 
0 (0%) 
0 (0%) 
0 (0%) 


102 (86%) 
7 (6%) 
3 (3%) 
6 (5%) 


Totals 


46 


55 


17 


118 


Interpretation correct? 
Yes 
No 

Questionable 

IN/H 


26 (57%) 
9 (20%) 
0 (0%) 

11 (23%) 


38 (69%) 
7 (13%) 
9 (16%) 
1 (2%) 


14 (82%) 
0 (0%) 
3 (18%) 
0 (0%) 


78 (66%) 
16 (14%) 
12 (10%) 
12 (1Q%) 


Totals 


46 


55 


17 


118 


Inferential Statistics 










Technique Correct? 
Yes 

Questionable 
Can't tell 
N/A a 


28 (61%) 
1 (2%) 
0 (0%) 
0 (0%) 

17(37%) 


35 (63%) 

0 (0%) 
2 (4%) 

1 (2%) 
17(31%) 


13 (76%) 
0 (0%) 
0 (0%) 
0 (0%) 
4 (24%) 


76 (64%) 

1 (1%) 

2 (2%) 
1 (1%) 

38 (32%) 


Totals 


46 


55 


17 


118 


Interpretation correct? 
Yes 
No 0 

Questionable 
N/A b 


3 (7%) 
26 (56%) 

0 (0%) 
17 (37%) 


1 (2%) 
37 (67%) 

0 (0%) 
17 (31%) 


1 (6%) 
11 (65%) 
1 (6%) 
4 (24%) 


5 (4%) 
74 (63%) 

1 (1%) 
38 (32%) 


Totals 


46 


55 


17 


118 


a N/A indicates that statistics were not reported nor considered necessary. 

N/A indicates that statistics were not reported. In some cases we think thev should have been. 

°A rating of "no" indicates at the very least no mention of violation of the underlying assumption of ran- 
dom sampling. 
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defensible within the confines of the study itself. 
We deliberately excluded the important issue of 
generallzability, which would have resulted in a 
much more negative evaluation (see Table 13). The 
main factors influencing our judgment were: (1) 
adequacy of Instrumentation, (2) severity of threats 
to the internal validity of the study, and (3) ade- 
quacy of the interpretation of data. (In addition to 
the weaknesses discussed previously, we found an 
all-too-common tendency to make cause-effect 
statements in much stronger terms than v/ere jus- 
tified.) In our judgment, the conclusions reached 
by the authors were clearly justified in only 20 (44 
percent) of the studies published In TRSE, 27 (49 
percent) of those in JSSR, and nine (53 percent) of 
those In SE (Table 16). 

Educational Significance of Studies. Resear- 
chers often talk about the statistical significance of 
their findings, but they just as often fail to talk 
about the significance of their results In any larger 
sense. Why are the results of a study Important, 
and to whom? What is the practical significance of 
a study's results? Why do they matter (or do they)? 
We asked ourselves these questions as we read 
these studies and attempted to weigh them in this 
light. Would the results of any of these studies 
make a difference to teachers and other profes- 
sionals? In our judgment, many of them would not. 
We give our impressions in Table 17. The phrase 
"can't tell" indicates we were so confused by the 
study as to be unable to judge its significance. Al- 
though we almost always agreed, we acknowledge 
that we cannot clearly articulate the basis for this 
judgment. 



Relevance of Citations. Table 18 indicates our 
judgment of the relevance of citations for the topic 
of a study. In our judgment, the references in some 
studies had little direct relevance to the study in- 
volved. 

Notes 

1. We dc not cite specific studies critiqued be- 
causa we have no desire to engage in destructive 
criticism or to embarrass authors. We will be 
happy, however, to provide citations to Interested 
readers who wish to assess the accuracy of our 
specifics. 

2. By ex posf facto research, we mean any 
study in which an investigator seeks an explanation 
for findings that have already occurred. Suppose, 
for example, that an administrator in a large, urban 
nigh school notices that the end-of-year test scores 
for students In a particular social studies teacher's 
classes are markedly higher than those of the stu- 
dents of other teachers, and have been for several 
years. She wonders why, and decides to compare 
several variables of the two groups -characteris- 
tics of the students, materials used, teaching style, 
etc. -in an attempt to gain insight into why this is 
the case. The differential results, however, have al- 
ready occurred, and the administrator Is seeking 
an explanation for these results after the fact. 

3. We include researcher effects under in- 
strumentation. 

4. Maximum n,2 = >J7T r i r 2,2 = ^64 x .96 = 
.78 







TABLE 16 






RESULTS: LEGITIMACY OF CONCLUSIONS 


Were the Conclusions 










of the Study Legitimate? 


TRSE 


JSSR 


SE 


Total 


Yes 


20 (43%) 


27 (49%) 


9 (53%) 


56 (47%) 


No 


13 (28%) 


13 (24%) 


1 (6%) 


27 (23%) 


Questionable 


3 (7%) 


14 (25%) 


7 (41%) 


24 (20%) 


Can't tell 


10 (22%) 


1 (2%) 


0 (0%) 


11 (10%) 


Totals 


46 


55 


17 


M8 



20 

AT 



TABLE 17 

DCC I II TO. Crvi IOATIAM A l Oi^Min^AM^r Ar nriii- n-ri in : r-r> 
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Were the Outcomes 

Educationally Significant? TRSE JSSR SE Total 



Yes 


22 (48%) 


15 (27%) 


7 (42%) 


44 (37%) 


Questionable 


12 (26%) 


13 (24%) 


1 (5%) 


26 (22%) 


No 


10 (22%) 


27 (49%) 


9 (53%) 


4G (09%) 


Can't tell 


2 (4%) 


0 (0%) 


0 (0%) 


2 (2%) 


Totals 


46 


55 


17 


"118 








TABLE 18 








RESULTS: RELEVANCE OF CITATIONS 


Relevance (1 = hlgh; 5 


= low) TRSE 


JSSR 


SE 


Total 


1 


17 (37%) 


10 (18%) 


5 (29%) 


32 (27%) 


2 


10 (22%) 


22 (40%) 


1 (6%) 


33 (27%) 


3 


17 (37%) 


19 (35%) 


7 (41%) 


43 (36%) 


4 


2 (4%) 


3 (5%) 


4 (24%) 


9 (9%) 


5 


0 (0%) 


1 (2%) 


0 (0%) 


1 (1%) 


Totals 


46 


55 


17 


118 
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CHAPTER 2 
SOME SUMMARY OBSERVATjONS 
ABOUT SOCiAL STUDIES RESEARCH 



What does our analysis of these studies reveal? 
in general, progress over time appears to be slow. 
Much stiii can be done, it appears, to improve the 
quality of social studies research. We offer the fol- 
lowing observations. 

Methodology 

Experimental and survey research methodolo- 
gies predominate, Of the 118 studies we reviewed, 
47 (40 percent) were either true or quasi-experi- 
ments, and 43 (36%) were either questionnaire- or 
interview-type surveys, for a tote! of 90 (76%). 
Recent reviews of research (e.g., Armento 1986; 
Stanley 1985) document that thes* types of re- 
search continue to dominate our field. Other 
forms of research, such as historical inquiries and 
ethnographic studies, are much less commonly 
found, both in doctoral dissertations and in our re- 
search journals, although they do occur. Some re- 
search methodologies, such as causai-comparatlve 
investigations, are truly rare, in our review, we 
found only three (2 percent) studies that were 
causal-comparative investigations. 

We think this is too narrow a vision of research 
to dominate the field. The term research means 
any sort of "careful, systematic, patient study and 
investigation in some field of knowledge, under- 
taken to discover or establish facts and principles" 
{Webster's New World Dictionary 1984). Many 
methodologies fit this definition. Additional models 
that could (and should, we think) be utilized by so- 
cial studies educators more frequently include case 
studies; content analyses; intensive, in-depth inter- 
views (particularly when used to illuminate student 
comprehension); historical inquiries; correlational 
studies; ^tructuied observations; participant obser- 
vations; causal-comparative investigations; eth- 
nographic studies; and cross-cultural comparisons. 
Out of 239 articles published in the three journals 
(of which we reviewed 118). only 20 were con'oi t 
analyses; 19 were correlational studies; 11 v *,re 
ethnographies; eight were historical inquiries; th r ee 
were structured observations; and three were case 
studies. There was one each of participant observa- 
tions and cross-cuiturai studies. 

We think that ail of these research 
methodologies have value, since each constitutes 
a different way of inquiring into the realities that 
exist within social studies classrooms and the 
minds and emotions of social studies students, 
teachers, and other professionals. While ail of 



these methodologies (as well as experiments and 
surveys) have various limitations (and thus can be 
well or poorly executed), their wider use would 
help to provide some additional, and different, 
perspectives aboirt important questions in social 
studies education, it is encouraging to note that, 
while they stiii remain relatively few compared to 
the more common forms of experimental or survey 
research, more studies using some of these alterna- 
tive methodologies are being reported in the social 
studies research literature (Armento 1986; Stanley 
1985). 

Many re3earch questions in the social studies 
can be studied through experiments or surveys, 
but they also might well be investigated by otner 
meinodoiogies. ino^ed, some of the other 
methodologies that we have mentioned are often 
better suited to providing the information desired. 
We believe that research in social stud'es educa- 
tion should ask a variety of question^, move in a 
variety of directions, encompass a variety of 
methodologies, and use a variety of tools. Different 
research orientations, perspectives, and goals 
should not only be allowed, but encouraged. 

Focus/Clarity 

in general, the authors made clear the focus 
and variables of their studies. Definitions were 
somewhat better than expected, though lacking in 
some 30 percent of the studies. These researchers 
sought primarily to understand more clearly or in 
more detail various aspects of the field. The great 
majority did not try to point up inaccuracies, distor- 
tions, ideological bias, etc., but rather to under- 
stand more fully the outcomes of particular 
methods/techniques, the characteristics of stu- 
dents, and the characteristics and opinions of so- 
cial studies professionals. Although this comment 
is in line with what other reviewers have observed, 
a growing amount of research of the former type is 
being reported (e.g., see Anvon 1978; Giroux and 
Penna 1979; Popkewitz, Tabachnick, and Wehiage, 
1981; Romanish 1983; Saitonstaii 1979). Although 
we found few empirical studies of a critical nature 
in the three journals we reviewed, we did find many 
arguments and position pieces (e.g., Cherryhoirnes 
1982; Common 1982; Cornbieth 1985; Egan 1980; 
Giroux and Penna 1979; Gordon 1985; Hahn and 
Blankenship 1 963; Holmes 1 982; Hurst 1 980; 
Romanish 1983; Stanley 1981; Wasburn 1986). 
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Sample 

Only seven (6 percent) of the 1 18 studies 
reviewed attempted to use truly random samples, 
as compared with a total of 15 percent found by 
Shaver and Norton (1980). Sample descriptions 
often left much to be desired. 

Replication 

Unfortunately for the build-up of a knowledge 
base, we found only eight studies (6 percent) that 
were replications of other work (four on each of 
two topics). This continues to be a major failing of 
social studies research. The social studies research 
community has not made a systematic effort to 
build a cumulative base of knowledge about many 
of the important questions of interest to the profes- 
sion. Doctora! students continue, in the main, to do 
isolated studies, often unaware that similar or re- 
lated work is bemg done by their counterparts else- 
where (Hepburn and Dahier 1965). Few doctoral, 
and :ewer master's, studies are expanded or 
developed further once they are completed. As 
Shaver has remarked, there is "a failure in many in- 
stances... to relate a piece of research to previous 
studies in any sort of programmatic way. The con- 
sequences are, on the one hand, the repetition of 
unproductive prior research and, on the other, a 
disconnectedness of studies on similar topics. Both 
are counterproductive to knowledge building" (Nel- 
son and Shaver 1985, p. 410). 

Internal Validity 

The internal validity of many studies, unfortunate- 
ly, was suspect. Threats that appeared in a large 
number of studies included a subject effect, in 
which characteristics of the subjects may have ac- 
counted for the results; an instrumentation effect, 
in which the data collection procedure may have 
acted to bias the results; a Hawthorne or John 
Henry effect, in which some of the subjects may 
have known they were part of an experiment of 
some sort; and mortality, where some of the sub- 
jects may have dropped out of the comparison 
groups in unequal amounts. A positive sign with 
regard to internal validity was that, in the interven- 
tion studies, it was clear that the treatment actually 
did occur. 

ReiiabU' and Validity of Instruments 

Reliability and validity checks on instruments 
were not performed in a large majority of studies. 
Out of the 118 studies reviewed, 64 contained no 
reiiabiiity checks whatsoever; in 91 of these 
studies, the researchers made no attempt to check 
validity! These researchers appeared either to ig- 
nore these issues or to accept unquestioningiy 



evidence from prior data collection, in some cases, 
such evidence did seem appropriate to the study 
at hand, but in many cases, unfortunately, it did 
not. The absence of reliability and validity checks 
continues to be a major failing in much social 
studies research. 

External Validity 

The external validity of these studies also proved 
deficient, in almost three-quarters of the studies in 
ail three journals, the authors either explicitly or im- 
plicitly generalized to indefensible target popula- 
tions; in 74 percent, they made no "mention of 
ecological generaiizabiiity, thereby implying that it 
should be taken for granted. Although these 
authors generally used the correct statistics in 
analyzing their data, they often interpreted their 
findings incorrectly, leading us to conclude that 
many in the profession appear to lack adequate un- 
derstanding of statistical interpretation. 

Theory 

Very few of the authors of these studies tried to 
connect their work to some underlying theory. 
While the usefulness of theory in guiding and or- 
ganizing research can hardly be questioned, we 
doubt whether the diversity of our field can be en- 
compassed In any one theory. Most iikeiy, at least 
at present, we shall have to settle for theories that 
address subtopics, such as the Merriii-Tennyson 
theory of sequencing examples in concept attain- 
ment, which provided focus for one of the two repli- 
cated topics in our review. 

Investigates 

in virtually ail the studies we reviewed, as well as 
in virtually ail the articles published in the three 
journals (i.e., including the instrument studies, the 
position pieces, the content analyses, the historical 
studies, and any others that we did not review), the 
authors were college professors (primarily) or other 
social studies professionals (supervisors, state so- 
cial studies specialists, etc.). Classroom teachers 
were noticeable by their absence. We found only 
two studies in which a classroom teacher was one 
of the researchers. Although we recognize the 
severe workloads under which most social studies 
teachers labor, we lament their absence from these 
reports. Accordingly, we shall offer some sugges- 
tions as to the kinds of research investigations 
classroom teachers could conduct in Chapter 5. 

Data Sources 

Where did these researchers get their data? in al- 
most ail of the studies we reviewed, the data col- 
lected by the researchers appeared to come from 
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one of four main sources: (1) the performance of 
students In social studies classes, (2) the opinions 
of students and/or teachers In schools, (3) the view- 
points of social studies supervisors, social studies 
methods professors, or other social studies profes- 
sionals, (4) various documents, such as courses of 
study, curriculum guides, state frameworks, etc. 

Anothor approach to the problem of basic data 
gathering is the development of centralized data 
bases that social studies researchers and others 
might use and to which they could contribute. The 
details of both existing and potential data bases 
are beyond the scope of this monograph, but three 
that should be mentioned are: 

• National Assessment of Educational Progress 
(NAEP) 

Educational Testing Service 
Princeton, NJ 08540 

• The College Board 
45 Columbus Avenue 
New York, NY 10023-6917 

• High School and Beyond 
Center for Statistics 
Department of Education 
555 New Jersey Avenue, NW 
Washington, DC 20208-1310 

Topics Investigated 

What about content? What topics did these re- 
searchers study? How significant were these 
topics? Almost all of these studies focused on rela- 
tively narrow or (in our judgment) unimportant 
relationships, rather than on important concepts 
and ideas, or important issues. This observation is 
In line with what many other reviewers have noted 
(Ehman and Hahn 1981; Metcalf 1963; Shaver 
1979b; Shaver and Larkins 1973; Wiley 1977; Stan- 
ley 1985). A positive comment is that in the 
majority of these studies, the authors did attempt 
to Justify their research. We did a limited content 
analysis of the topics investigated by the authors of 
these studies. Our findings are shown in Table 19. 

Such analyses, of course, obscure the specific 
questions addressed, but they do give a picture of 
overall activity. Within these categories, the only 
specific topics addressed in more than two studies 
were concept acquisition and development (12 
studies), student political attitudes and opinions 
(five studies), student opinions on social studies 
content (five studies), effects of teacher enthusiasm 
(four studies), student knowledge of economics 
(four studies), and teacher evaluation of the use of 
objectives (three studies). Noteworthy by their ab- 
sence were any studies that looked specifically at 
the learning of gifted students in social studies, 
that compared the social studies learnings of dif- 



ferent (i.e., ethnic, cultural, socioeconomic, etc.) 
subgroups of students or that analyzed existing 
data bases of the type mentioned above. Also miss- 
ing were investigations of social studies In different 
types of school settings (e.g., urban vs. rural, 
public vs. private, etc.), in specialized (e.g., mag- 
net, classics-oriented, comprehensive, etc.) 
schools, or in other lands. Virtually all of the 
studies we reviewed investigated aspects of social 
studies in the United States. We found only two ar- 
ticles (of 239 published) in ail three journals that 
described aspects of social studies education in 
another country 

The variety of topics covered is not surprising in 
a field that is by definition and tradition as diverse 
in its subject matter as social studies and as in- 
fluenced by community expectations and values 
(Beriak and Beriak 1981), As several critics have 
shown (Barr, Barth, and Shermis 1977; Morrissett 
and Haas 1982; Newmann 1986; Stanley 1985), 
there is continuing disagreement among social 
studies theorists and curriculum developers as to 
what should be emphasized in our field. It Is cus- 
tomary under such conditions to call for renewed 
attempts to unify the field or at least parts of it 
under some theoretical structure. We applaud such 
efforts, but we do not believe researchers can, or 
will, await such developments. 

As an alternative, we suggest that leaders In the 
field attempt to identify and even prioritize impor- 
tant categories, topics, and/or questions for re- 
searchers to investigate, Ehman and Hahn (1981) 
proposed several categories in Part ii of the 1981 
National Society for the Study of Education Year- 
book (see pp, 60-78). Nelson and Shaver (1985) 
proposed a list of questions in their chapter In the 
recent review of social studies research (see pp, 
408-410). We suggest another below. We believe 
that the attention of researchers, above all else, 
should be directed toward finding out what stu- 
dents know and how to help them iearn. This is 
hardly a new idea, but one worth repeating. Our 
preferences in this matter are strongly influenced 
by our work with the late Hilda Taba. We believe 
that both the content and process objectives she 
advocated provide a sound basis for focusing re- 
search. Whether one agrees with her curriculum ap- 
proach, we believe the following questions are 
worth considering: 

1. What is the present level and range of student 
knowledge and/or commitment at different 
ages/grade levels with respect to major objectives 
in social studies, namely: 

a. identified concepts and ideas? 

b. identified cognitive skills? 

c. Identified values? 
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TABLE 19 

A CONTENT ANALYSIS OF RESEARCH TOPICS 
IN TRSE, JSSR, AND SE 


Topic 


TRSE 


JSSR 


SE 


Total 


Characteristics of teachers 
or supervisors (and their 
effects) 


4 


6 


1 


11 


Attitudes or opinions of 
social studies educators 


5 


10 


0 


15 


Characteristics of student 
teachers 


2 


5 


0 


7 


Characteristics of students 


15 


8 


3 


26 


Teaching methods 


16 


15 


12 


43 


Pre- or inservice training 


0 


8 


0 


8 


Program requirements 


3 


3 


0 


6 


Dissemination of innovations 


1 


0 


1 


2 


Totals 


46 


55 


17 


113 



2. What Is the present level and range of teacher 
capabilities with respect to: 

a. Their own knowledge and attitudes toward 
these objectives? 

b. Their competence and/or aptitude for teach- 
ing these objectives, especially with students of dif- 
fering abilities, age, ethnicity, t< t id gender? 

3. What are the attitudes of parents, school per- 
sonnel, and school boards toward these objectives? 

4. What methods are effective in increasing un- 
derstanding and support for these objectives 
among the groups named in question #3? 

5. How do the objectives mentioned in question 
#1 correspond to developmental patterns in 
general cognitive abilities, interest, and attitudes 
among students? 



6. How do students of differing cultural, ethnic, 
and socioeconomic backgrounds vary with regard 
to these objectives? 

7. What general and specific teaching methods 
are effective in fostering these objectives with dif- 
ferent types of students and in different types of 
schools and communities? 

8. How can these methods accommodate impor- 
tant differences in student readiness? if necessary, 
how can important readiness variables be as- 
sessed? 

9. How can teachers best be helped to imple- 
ment these methods? To what extent must they 
develop their own methods or adaptations? 

10. What factors, both within and without the 
school, hinder and help the education of students 
with respect to these objectives? 
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11. To what extent does competence In these 
objectives generalize to other subject matter and 
to daily-life activities? 

Clearly, this Is just a beginning. However, we 
believe some such systematic approach to our 
field would permit researchers to focus their efforts 
In ways that would contribute to a more Integrated 
body of knowledge with relevance to Important Is- 
sues of policy and practice. We also believe that 
failure to study these Issues is one reason why the 
"new" social studies projects of the 1960s had less 
Impact than was anticipated. 

In conclusion, our analysis supports recent 
criticism of educational research In general as 
being deficient In both application and discussion 
of principles of good research with respect to sam- 
pling, Internal validity, Instrumentation, end data 
analysis. We also concur that both topics and 
methodology are too narrow. On the positive side, 
we were pleasantly surprised at the general quality 
of justifications, clarity of focus and terminology, 



documentation of treatment implementation, ade- 
quacy of treatment time, and distinction between 
results and Interpretations. 

Notes 

1. These also are the methodologies most com- 
monly found In social studies doctoral disserta- 
tions. Based on a review of the abstracts of some 
394 doctoral dissertations written between 1977 
and 1983, Hepburn and Dahler found that descrip- 
tive studies comprised 45 percent, or 177 of the 
total. Experimental research comprised 27 percent, 
or 105 of the total. Thus, the two together totaled 
282, or 72 percent of the total (Hepburn and Dah- 
ler 1985, pp. 77-78). 

2. We also recommend the analysis of existing 
data bac?s, which we believe Is an area of re- 
search that has been largely Ignored to date by the 
social studies research community (see page 25 
for a list of a few of these data bases.) 
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CHAPTER 3 

SUGGESTIONS FOR IMPROVING THE QUALITY 
UF SOCIAL STUDIES RESEARCH 



In light of the foregoing analysis and observa- 
tions, we wish to offer some suggestions we think 
could improve the quality of social studies re- 
search. The remarks in this section are directed to 
three distinct groups of social studies educators: 

(1) professors who direct master's theses or doc- 
toral dissertations in social studies education, but 
who do not teach courses in educational research, 

(2) graduate students in social studies education 
who intend to do research, and (3) classroom 
teachers, curriculum directors, and administrators 
who have an interest in research. 

Before we offer our suggestions, however, we 
wish to make a distinction between the terms so- 
cial education and social studies, for the remarks 
that follow have mainly to do with social studies re- 
search. The distinction to which we subscribe is 
the one offered by Nelson and Shaver: social 
education is "a term inclusive of the broad con- 
cerns of social knowledge, social relations, social 
development, and social improvement, which are 
among the goals of social studies, but go beyond 
schooling practices In their intentions, activities, 
and research implications," whereas social studies 
identifies "the schooling part of social education" 
(Nelson and Shaver 1985, p. 401). Most of our sug- 
gestions apply primarily to research in schools. Al- 
though some may apply to studies that take place 
outside of schools, such studies are not the focLS 
of these remarks. 

What follows, then, are some ideas about how 
to improve the quality of social studies research. 
Many of these ideas are suggested by the weak- 
nesses we noticed in our review in Chapter 1. 
Since experiments and surveys remain the most 
commonly conducted types of research (72 per- 
cent of the total in the studies we reviewed), more 
of our suggestions focus on these methodologies 
than others. Space limitations prevent an extensive 
discussion of other methodologies, but we offer a 
few ideas that frequently seem to be ignored in 
practice. Since most social studies educators are 
not trained in historical or ethnographic research, 
these methodologies In particular seem to be logi- 
cal candidates for further study (e.g., see Agar 
1986; Barzun and Graff 1977; Bogdan and Biklen 
1982; Carr 1967; Dobbert 1982; Gottschalk 1969; 
and Spindler 1982). 

All of the ideas we present are relatively easy to 
implement. Very few are new; most have been iden- 
tified by one or more other observers. Neverthe- 
less, we believe that they bear repeating. Ex- 



perience with our own graduate students suggests 
that even those students who have had two or 
three courses in research continue to make rather 
fundamental mistakes. Furthermore, as our 
analysis in Chapter 2 revealed, many 'of these 
ideas continue to be ignored in practice. 

Improving Experimental Research 

1. De-emphasize random sampling. Obtaining a 
truly random sample is almost an impossibility in 
school-based research, given today's organization- 
al and scheduling constraints. When and where 
possible, of course, random sampling is to be en- 
couraged. An alternative strategy, however, is to 
concentrate on describing relevant demographics 
of one's sample (e.g., ages, gender, ethnicity, IQ 
scores) in enough detail so that other researchers 
(and other interested professionals) get a fuller pic- 
ture of exactly who was Involved in the study. We 
believe the profession might profitably attempt to 
develop guidelines as to the kind of description 
that ought to be provided. 

Oftentimes, even in intact classes, random as- 
signment of students to treatment and control 
groups can be implemented. It should be recog- 
nized, however, that this technique Is really only ef- 
fective with large groups (we recommend at least 
50 subjects per treatment group). When smaller 
groups must be used, or when random assignment 
is not feasible, much more attention should be 
given to matching (mechanically or statistically 1 ) 
groups on potentially related variables, as well as 
on the outcome (dependent) variable(s). 

2. Increase the chances of the treatment's hav- 
ing an effect. In essence, this suggestion involves 
intensifying the treatment the experimental group 
receives. There are three possibilities here: 

a. Be clear that there is a treatment. Sometimes 
treatments are so vaguely defined or described 
that exactly what happened to students in the ex- 
perimental group is not dear. Operational defini- 
tions of the independent variable(s) can help clarify 
the nature of the treatment. 

b. Lengthen the time of the treatment. Often- 
times, the length of time that students are exposed 
to a treatment is so short that its possible effect(s) 
may not be discerned (Wallen and Fraenkel 1988). 
Eisner (1983) found that the median experimental 
treatment time per subject in the studies that he 
reviewed in 1978 was only 45 minutes! One can 
take slight encouragement from the faci that a 
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review some five years later showed an increase in 
the median experimental treatment time to one 
hour and 15 minutes per subject (Eisner 1983, p. 
14). Although the treatment time was considerably 
longer In the studies we reviewed, It was still of con- 
cern in over one-third of the total. 

c. Check (through the use of observers, audio- 
or videotaping, subject reactions, etc.) to make 
sure that the treatment really occurs and that it oc- 
curs as intended. 

3. Concentrate on description and explanation 
more than prediction. Given the difficulty in obtain- 
ing random samples In most school settings, the 
generaiizability of most social studies research will 
be severely limited. This suggests the value of plac- 
ing more emphasis on description and explanation 
and less on prediction. Vividly described details of 
Interventions (or in non-intervention studies, of set- 
tings) can help others in similar situations assess 
the applicability of particular results to their situa- 
tion. As mentioned above, the nature of the treat- 
ment should be clearly and fully described. Exactly 
what happened? How? When? Where? Under what 
conditions? 

4. Use more than one Instrument to measure the 
dependent variable. In the great majority of social 
studies research, the researchers use only one 
measuring device to obtain data concerning the 
outcome of Interest. This unnecessarily limits the 
amount of information gathered concerning the 
possible effects of the independent variable(s). Use 
of a second Instrument also permits a check on 
concurrent validity. In our review of 118 studies 
that used Instruments, only 12 (10 percent) used 
more than one measuring device to obtain data on 
the dependent variable(s). 

5. Pay more attention to alternative explanations 
of findings due to "mortality" and "Hawthorne ef- 
fect" threats. We found a sizable number of studies 
in which these were concerns (16 and 20, respec- 
tively). If subjects are lost to a study, researchers 
should attempt to determine whether the propor- 
tion was about the same for all treatments and 
whether the causes were likely to favor certain treat- 
ment groups. If, for example, the experimental treat- 
ment is a difficult one for students, and hence 
those lost" are those having the most difficulty 
(they may change groups or just absent themsel- 
ves from treatment or testing), the data on those 
remaining in that group would not reflect the lowei 
performance of the absentees. 

A possible Hawthorne effect exists whenever 
one group receives any sort of special attention. 
This threat is hard to control in studies involving a 
major curriculum modification, since provision for 
special treatment of comparison groups is often 
not feasible (or is artificial). Despite the difficulties 



presented by these two threats, they should 
receive more attention than appears currently to be 
the case. 

6. Study more than one dependent variable. 
Rarely do social studies researchers look at more 
than one dependent variable when studying the ef- 
fects of a particular treatment. Once again, this un- 
duly restricts the amount of information that might, 
with only a little extra effort, be obtained. It also 
weakens understanding of the possible effects of 
an independent variable. Theory or experience 
usually suggests that a treatment will affect more 
than one outcome variable. Further, unintended or 
unanticipated outcomes should be studied to the 
extent feasible. It is not very difficult, for example, 
to measure the attitudes of students in studies 
where achievement is the dependent variable (e.g., 
see Smith 1980). We are not suggesting that addi- 
tional variables be Included merely for the sake of 
addition. A clear and defensible rationale is always 
required. 2 

7. Incorporate additional Independent variables 
into your design. Many times the effect(s) of a treat- 
ment may be predictably revealed in one or more 
subgroups, yet not appear in the total group of 
which the subgroups are a part. Analyzing a treat- 
ment group in terms of gender or ethnic com- 
ponents, for example, may reveal otherwise un- 
recognized effects. Factorlai designs that enable a 
researcher to study several independent and de- 
pendent variables in a single study are almost 
never employed in social studies research. 

8. Discuss the magnitude of any effects ob- 
served. Soda! studies researchers commonly 
report their findings in terms of significance levels, 
using Inferential statistics, but the notion of statisti- 
cal significance is Intimately related to sample size. 
Given a large enough sample, almost any resulj 
will be statistically significant. Whether a finding Is 
significant oniy tells us the likelihood of an effect 
occurring by chance; It does not allow us to com- 
pare effects across studies of similar phenomena. 
As many observers have suggested, the calculation 
of an effect size Is helpful In this regard (Borg and 
Gail 1983; Nelson and Shaver 1985; VanSickle 
1983). Similarly, the reporting of the percent of 
variance accounted for- Eta -provides another in- 
dication of magnitude. 

9. Be less concerned about statistical sig- 
nificance and think more about educational sig- 
nificance (despite the difficulty of assessing the lat- 
ter). The significance of a study continues, for most 
social studies (and other) researchers, to mean 
statistical significance. Because the results of a 
study are statistically significant (were not due to 
chance), however, does not mean that they are sig- 
nificant in any larger sense. The import of a study - 
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how it matters in the larger scheme of things, to 
students, to teachers, to the profession as e 
whole -Is rarely discussed. Researchers should 
watch for noticeable effects whether they are statis- 
tically significant or not. 

In particular, the emotional reactions of students 
should be assessed if at all possible. How strongly 
did they react to a particular treatment or ex- 
perience? Why do they say they react in this way? 
Do different groups react differently? When stu- 
dents react strongly (either positively or negatively) 
to an Intervention or an experience, further inves- 
tigation is probably warranted. Of the 118 studies 
we reviewed, only five (4 percent) assessed stu- 
dent reactions to social studies subject matter; 
only two (1.6 percent) assessed student attitudes 
toward some aspect of social studies Instruction! 

10. Assess the durability of an effect. Delayed 
posttests are virtually never given to see whether 
the perceived effects of an independent variable 
remain over any length of time or change In any 
way (Leming 1985). The durability of the effects of 
Independent variables in social studies research 
remains largely unknown. 

11. Make better use of descriptive statistics. 
Whether we are correct In believing that one of the 
causes Is overemphasis on inferential statistics, it Is 
clear that many of the studies we reviewed inap- 
propriately used and/or interpreted basic descrip- 
tive Indices. We agree with Kerllnger (1986) that ex- 
cessive reliance on computer packages may be fur- 
ther contributing to this problem. We encourage re- 
searchers to stay closer to their data and pay 
greater attention to such simple indices as 
medians (In addition to means), as well as to fre- 
quency polygons and scatterplots-both of which 
can be easily obtained through computer analysis. 
We recommend that much more thought be given 
to both the magnitude and the pattern of group dif- 
ferences found and their Implications -which may 
be quite different for questions that are primarily 
theoretical as compared to practical. 

12. Give more attention to the Interpretation of 
results. Most of the studies we reviewed did a 
good job of keeping results and interpretations dis- 
tinct. Most often, however, the larger meaning of 
results was Inadequately discussed. Too often, 
authors discussed unwarranted direct applications 
(population generallzability). 

We would recommend more discussion of the 
implications of results In the context of both prac- 
tice and theory. For example, a study finding that 
understanding of social studies Ideas Is correlated 
with level of general concept development In 
young children implies that (a) teachers may need 
to assess level of concept development, and (b) 
developmental theories with regard to concepts 



apply to social studies content. Such discussion 
would help others decide whether the replication 
needed to generalize is worth the effort. 

Improving Survey Research 

1. Trial-test ail questionnaires or interview 
schedules. Of the 43 survey studies we reviewed, 
only two indicated that the questionnaire or Inter- 
view schedule used was checked beforehand. Pilot 
testing with a small group similar to the group to 
whom the questionnaire or interview schedule Is to 
be administered can help reveal lack of clarity, 
bias, and/or ambiguity in questions before it is too 
late to change them. 

2. Check the validity and reliability of the ques- 
tionnaire or interview schedule being used. Many 
studies reporting survey results do not Indicate if, 
or how, the validity and reliability of the survey in- 
strument were checked. Like any measuring Instru- 
ment, a questionnaire or Interview schedule needs 
to be checked for reliability and validity to Insure 
that data obtained Is related to what the researcher 
Is trying to assess. Out of the 118 studies we 
reviewed, only 21 (46 percent) In TRSE, 26 (47 per- 
cent) in the JSSR, and seven (41 percent) In SE 
made some attempt to check instrument reliability, 
while a startling 32 (70 percent) In TRSE, 46 (84 
percent) in the JSSR, and 13 (76 percent) in SE 
made no attempt to check validity! Content validity, 
at least, can be assessed through the use of inde- 
pendent judges who rate the questions to be 
asked in terms of whether they measure the vari- 
ables the researcher has In mind. The researcher 
can then revise any to which the judges object. 

Many investigators appear to think that validity Is 
unimportant when factual questions are asked. 
They need to remember that it Is not the fact itself 
that Is of concern, but the way in which the factual 
information is obtained. This certainly can lead to 
invalid interpretation. It Is often difficult, but not im- 
possible, to ask for the same factual Information in 
more than one way, as Klnsey and his associates 
demonstrated forty years ago (Klnsey, Pomeroy, 
and Dartln 1948). 

3. Think about the length of the questionnaire or 
Interview schedule. It should be neither too long 
nor too short. The proper length, of course, Is a 
matter of judgment, but researchers need to con- 
sider whether their instruments are sufficiently long 
to provide them with enough Information concern- 
ing what they are looking for, yet not so long that 
respondents become tired, bored, or careless. The 
length of a survey Instrument may seem too ob- 
vious a point to mention, but almost everyone has 
neglected to respond to a survey at least once be- 
cause the length of the questionnaire discouraged 
us from doing so. 
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4. Check for sampling bias. How representative 
is the accepting sample (those who actually 
respond to the questions) of the specific group 
being surveyed? This depends, of course, on the 
percentage of responses returned. When a substan- 
tial percentage of responses is not received (we 
think more than 20 percent), representing the find- 
ings as indicative of the Invited sample may be mis- 
leading. (This happened in many of the survey 
studies we reviewed.) A possible check on this is 
to interview a small sample of nonresponding sub- 
jects to see how, or if, their views differ markedly 
from those of the respondents. A second (or even 
a third) administration of the questionnaire can also 
help Increase the percentage of responses 
returned. Showing that respondents are similar to 
invitees with respect to at least some demographic 
variables permits additional confidence In generaliz- 
ing findings. 

5. Check respondent knowledge about the sub- 
ject before or during administration of the question- 
naire or interview schedule. This Is to make sure 
that respondents actually possess some 
knowledge concerning what they are to be ques- 
tioned about. Otherwise, the researcher cannot be 
sure that their replies represent what the respon- 
dents actually know about the issue(s) being sur- 
veyed. 

6. Try to make sure that you and your respon- 
dents speak the same language. Several years of 
experience in helping students design question- 
naires have shown us that this cannot automatical- 
ly be assumed. Sometimes a particular term can 
mean the exact opposite of what the researcher In- 
tends. Babbie (1986, p. 230) described an example 
in which the word "very" in the colioqulai language 
of Appalachla apparently was closer to what 
people in other parts of the country mean by "fair- 
ly" or even "poorly." Thus, when residents of the 
area responded "very well" to an inquiry about 
their health, they actually meant that they were just 
getting along. The best "solution" to this problem 
is a prior tryout that includes questions (preferably 
in interview form) specifically directed toward the 
meaning of terms. 

7. Train all individuals who will administer an in- 
terview schedulQ to ensure that they are able to ad- 
minister It correctly. Such training helps ensure that 
the data obtained will be both reliable and valid. 
Training should include a trial run to check on the 
manner of administration. Use of videotapes to 
provide feedback can be very helpful. 

8. Try to make sure that both researcher and 
respondents are operating from the same frame of 
reference -that Is, respondents must be clear 
about what the researcher expects regarding the 
questions being asked. This guards against differen- 



tial expectations leading to erroneous Interpreta- 
tions by the researcher. For example, if a re- 
searcher were to ask, "What do you think about 
what goes on in your history class?", one student 
might talk about the kinds of activities used by the 
teacher; another might comment on the homework 
assignments; yet another might talk about the 
teacher's way of questioning students. Others, un- 
sure of what the questioner wants, might not 
respond at ail. A less ambiguous question might 
be: "What do you think of the way your teacher 
conducts class discussions?" The important point 
here is that the researcher must make clear to 
respondents exactly what he or she wants them to 
respond to or comment about. 

9. Don't use an observation form with too many 
categories. Researchers must take care that their 
observational measuring Instruments (e.g., tally 
sheets, flow charts) are neither too long nor too 
short. Overly long observation instruments require 
too much of observers, while overly short ones 
produce only a partial analysis of what is observed. 
The difficulty involved in using an overly compli- 
cated tally sheet has been the downfall of many a 
graduate student. 

10. Check on the interrater agreement of inde- 
pendent observers to ensure a high degree of 
reliability (we would argue for at least .90). 
Reliability should be reported, using internal consis- 
tency Indices where appropriate. Stability over time 
should also be checked. 

11. Be sure to take a random or systematic sam- 
pling of whatever is being observed. Observing Just 
the beginning of a class, for example, can mislead 
researchers. Many reports of observations in social 
studies classrooms do not make clear exactly 
when, or during what period of time, the observa- 
tions took place. Typically, a sizable number of ob- 
servation periods (eight cr more) is necessary to 
achieve adequate reliability. 

improving Correlational or Causal-Comparative 
Research 

1. Be careful not to imply that correlation indi- 
cates causation. Although the fact that correlation 
does not mean causation is one of the most fre- 
quently mentioned caveats in research courses 
and research texts (e.g., Borg and Gall 1983; Ker- 
iinger 1986; Vockell 1983; Walien 1974; Wiersma 
1987), many studies stiii Imply, on the basis of a 
significant correlation, that a cause-and-effect 
relationship exists. 

2. Don't confuse statistical significance with 
educational (or practical) significance. This error is 
similar to that found so often in experimental 
studies, interpretation of the magnitude of a correla- 
tion coefficient continues to be one of the most 
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misunderstood aspects of research in social 
studies education. Correlational coefficients rang- 
ing from .20 to .35 show only a slight relationship 
between variables, even though they may be statis- 
tically significant. A correlation of .20, for example, 
Indicates that only four percent of the variance in 
the two variables that have been correlated Is com- 
mon to both. Such correlations have almost no 
value in any practical sense. A correlation of at 
least .50 must be obtained before any crude predic- 
tions can be made concerning groups (although 
they are usually of little help in making individual 
predictions). Even then such predictions are fre- 
quently in error (since they indicate only a 25 per- 
cent common variance). It is only when a correla- 
tion of .65 or higher Is obtained that Individual 
predictions that are reasonably accurate for most 
purposes can be made. Correlations over .85 indi- 
cate a close relationship between the variables cor- 
related and are useful in predicting both group and 
individual performance, but correlations this high 
are rarely obtained In social studies research (Borg 
and Gall 1983). 3 

3. Analyze as many relevant subgroups within 
the total sample being studied as possible. Many 
times, important relationships may be obscured 
when correlations are computed just for the total 
sample, rather than for certain subgroups within it 
as well. Sizable correlation coefficients may be 
found when subgroups (e.g., males and females) 
are examined. In analyzing subgroups, researchers 
should also examine the variability within each, 
since this affects the magnitude of the correlation. 

Improving Ethnographic Research 

1. Reflect on your own subjectivity. Ethnog- 
raphers have wrestled for years with the criticism 
that a researcher's biases can influence his or her 
descriptions. Ail research can be affected by per- 
sonal bias. The task for all of us is to limit our bias. 
One way to do this in ethnographic research is to 
take into account one's biases by describing, In 
detail, one's thoughts about what one is observing; 
in effect, to write memos to oneself about what one 
Is thinking (Bogdan and Biklen 1982). 

2. Do your best to "blend into the woodwork." 
The subjects of a study often attempt to create a 
false Impression of themselves, especially during 
the early stages. Teachers might not yell at any stu- 
dents, for example, or be especially patient. Stu- 
dents may be unusually cooperative. Principals 
may disrupt their normal routines. Accordingly, the 
researcher needs to act in such a way that the ac- 
tivities and conversations that occur in the re- 
searcher's presence are no different from those oc- 
curring in his/her absence. A thorough under- 
standing of the research setting is therefore crucial. 



Certain data may not ring true. Some data, in fact, 
may need to be discounted once it is interpreted in 
context (Deutscher 1973). 

3. Be a conversational rather than a formal ques- 
tioner. This idea is related to the suggestion above. 
A conversational form of interchange with subjects 
is more likely to engender natural, non-staged 
responses than is formal administration of an inter- 
view schedule or questionnaire. 

4. Take care that you are not unduly influenced 
by the most talkative subjects. Oftentimes, a re- 
searcher talks with certain students a dispropor- 
tionate amount of time compared to other students 
for the simple reason that they are the most willing 
to talk. This can result in misleading impressions 
and interpretations. You need not talk with all sub- 
jects for the same amount of time, but you should 
not rely exclusively on only a small number of sub- 
jects whose ideas may be somewhat atypical. Less 
talkative subjects should not be given up on too 
quickly. 

5. When appropriate, share your feelings about 
experiences you observe with your subjects. A re- 
searcher's feelings can help him or her establish 
rapport with subjects and gain insight Into their feel- 
ings. Bogaan and Biklen described an Instance in 
which an observer was overwhelmed with a feeling 
that things were out of control in a junior high 
school cafeteria she was visiting for the first time. 
When she mentioned her feelings In the teachers' 
room, several teachers began to discuss their feel- 
ings during their first few weeks on cafeteria duty. 
Discussing her feelings enabled the observer to 
gain insight into the feelings of the teachers In this 
school that she otherwise might never have ob- 
tained (Bogdan and Biklen 1982, p 132). 

6. When observing, practice describing rather 
than interpreting. Anthropologists work very hard at 
training themselves to avoid placing their own in- 
ferences into their basic data. No competent 
anthropologist, for example, would write in his or 
her field notes: "Ms. Jones punished Robert," 
which is clearly an inference, but rather something 
like "Ms. Jones told Robert to be still," or "Ms. 
Jones sent Robert to the office." Unfortunately, 
many applications of this methodology in educa- 
tion, including all of the ethnographies we 
reviewed, appear to be vulnerable to this criticism. 

7. Make a major effort to check information from 
more than one source (e.g., observations with inter- 
views, interviews with different informants). While 
this is a basic technique for validating all informa- 
tion, it is especially important in ethnography, since 
so much interpretation by the researcher is re- 
quired. 
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Some Ideas for Improving Research In General 

1. Make greater use of volunteers as subjects in 
methods studies. It Is standard advice that use of 
volunteers Is a serious threat to the generalizabilit/ 
of a study and hence should be avoided. This is 
true, but it is important to note that a negative 
result (In intervention studies) when volunteers are 
the subjects is a strong statement concerning the 
effectiveness of the treatment, if a treatment does 
not work with volunteers (whom we would assume 
would be more motivated than most), this is a pret- 
ty good indication it will not be effective with most 
other subjects. Perhaps this should be the first 
step In studying innovative methods. 

2. Consider the context within which a study 
takes place. Much experimental and quasi-ex- 
perimental research, for example, involves only one 
classroom, at most a very few, in which a treat- 
ment is applied under atypical conditions. Hence 
the applicability of the results to what most social 
studies classroom teachers do on an ongoing 
basis is often hard to see (this may be one of the 
reasons why most classroom teachers pay little at- 
tention to social studies research). Furthermore, lit 
tie attention Is usually paid to the nature of the 
school environment within which most teachers 
work, and whether it would be possible for 
teachers to manipulate students in ways similar to 
manipulation in research studies. Although we did 
not specifically evaluate studies on this issue, our 
overall impression is that virtually none addressed 
the issue of context. 

3. Indicate how the research relates to previous 
studies of the question at issue. Oftentimes there is 
no tie-in made* to other, related work, nor any in- 
dication of what other researchers have found with 
regard to the same, or similar, questions. Attempt- 
ing to relate one's own research efforts to the work 
of others Is another contribution that social studies 
researchers could make relatively easily to the 
building of a cumulative knowledge base in the 
field. The variation shown in our assessment* indi- 
cates our judgment that the studies we reviewed 
differed a great deal in this regard. 

4. Formulate and state a hypothesis when ap- 
propriate. Many social studies researchers under- 
take their Investigations without formulating and 
testing a prediction of some sort. Some critics 
would argue that the generation of hypotheses 
before a study begins limits the researcher's obser- 
vations, in that he or she may overlook or ignore 
data not related to the hypothesis. The value of for- 
mulating a hypothesis, however, is threefold: (a) it 
forces us to think more deeply about what we want 
to investigate and often clarifies what outcome(s) 
we are looking for, (b) it stimulates us to begin 
thinking about how we can test our theories, and 



(c) it encourages the development of a body of 
knowledge. Many studies designed to investigate 
the same hypothesis but containing different 
moderator variables might contribute to the build- 
ing of the knowledge base that the profession so 
badly needs, yet at present does not have. Of the 
1 18 studies we reviewed, only 29 (25 percent) con- 
tained an explicitly stated hypothesis; another 43 
(36 percent) contained an implied hypothesis. 
Forty-six (39 percent) did not attempt to investigate 
h hypothesis. 

5. Be sure to define key terms clearly. The lack 
of clearly defined terms is one of the most com- 
mon findings in the literature. In much social 
studies research, the reader Is unsure as to what 
the researcher means by many of the terms he or 
she uses. Terms like active learner, critical think- 
ing, values development, citizenship education, 
and others are frequently not defined. Thirty per- 
cent (35 out of 118) of the studies we reviewed 
lacked any definition whatsoever of the terms in- 
volved. 

it would be helpful to define all key terms opera- 
tionally— that is, to specify observable characteris- 
tics, behaviors, or conditions (along with how they 
can be measured). For example, defining motiva- 
tion as a desire to learn Is not very clear. A clearer 
definition would be: "any statements or actions an 
individual makes or takes which, in the judgment of 
at least two teachers or counselors, Indicates the in- 
dividual's desire to learn." 

6. Remember that instrument reliability Is crucial. 
Unless instruments are "sufficiently" reliable (a com- 
plex matter that can be reduced to the rule of 
thumb that the coefficient of reliability should ex- 
ceed .70), you are probably wasting your time. 
Checking Internal consistency is usually a simple 
matter. While other types of reliability do require 
more elaborate data collection, they should be 
seriously considered. 

7. Pay more attention to the possibility of a re- 
searcher effect. Researchers can influence study 
outcomes by systematically (though unintentional- 
ly) favoring certain treatment groups In either treat- 
ment application, data collection or both. We found 
very little attention given to this Issue by either eth- 
nographers or more traditional researchers. 

While it is true that standardization of proce- 
dures reduces this problem, a better guarantee of 
impartiality is ignorance, at least on the part of 
data collectors, who generally do not need to 
know the hypotheses or purposes of a study. In 
the studies we reviewed, the researchers appear to 
have been the data collectors in virtually all cases. 
While this may be legitimate and even necessary, 
the possibility of bias on the part of the researcher 
should at least be discussed. 
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8. Use more than one statistical tool to analyze 
findings. Here again, a little extra preparation and 
effort can pay dividends. Many, if not most, re- 
searchers use only one statistical procedure when 
they analyze their data. Most usually, means and 
standard deviations are computed. Frequently, ad- 
ditional statistics can be computed and presented; 
these Include, as appropriate, percentages, 
medians, ranges, correlation coefficients, and ef- 
fect sizes. These statistics can provide additional in- 
formation as to how various groups compare (e.g., 
see Powell and Powell 1984). 

9. Finally, do more to roplicate previous re- 
search. Almost ail research In social studies educa- 
tion is done in isolation. With rare exception (e.g., 
see Larklns and McKinney 1982), the replication of 
previous work under somewhat different settings, 
with different subjects or modified treatments, is 
simply not done. As Shaver has suggested, the sys- 
tematic replication of research findings would not 
only help "to establish their reliability and 
generallzablllty" but also past research efforts 
cou!u be used "as a basis for designing studies to 
correct methodological errors and build on past 
findings" (Nelson and Shaver 1985, p. 411). We 
find it ha/d to understand why the use of master's 
theses to replicate significant studies, a common 
practice In the physical sciences, has never caught 
on in the behavioral sciences In general, and in so- 
cial studies research in particular. 

Further, we recommend that more researchers 
cross-validate their research by checking their find- 
ings with the findings of others who used different 
methods. Thus, a researcher who found through in- 
terviews that teachers said they asked certain 
kinds of questions in class couio check to see if 



this finding is consistent with the findings of 
another study using direct observation. 

While this list i.« not intended to be exhaustive, 
we believe it highlights many of the more obvious 
weaknesses we noticed in our review. In order to 
discuss some of these suggestions further, we 
analyze a single study in some detail in Chapter 4. 

Notes 

1. We recommend calculation of regressed gain 
scores rather than (or in addition to) use of the 
very similar, but non-identical analysis of 
covariance because regressed gain scores provide 
additional descriptive information (the adjusted 
gain score for each student) as well as means and 
standard deviations. 

2. We recommend that social studies re- 
searchers consider the sophisticated and potential- 
ly powerful techniques of confirmatory factor 
analysis and covariance structural analysis, which 
are combined in LISREL (Linear Structural Rela- 
tions), a system Incorporating computer analysis. 
These types of analyses permit elegant and satisfy- 
ing clarification of some questions, but they do re- 
quire considerable mathematical, statistical, and 
computer sophistication. They also require a de- 
gree of theoretical clarity that is currently lacking In 
our field. We would caution further that such techni- 
ques make many of our recommendations ail the 
more important to consider. 

3. We think the coefficient of determination (r 2 ) 
should also be reported. In addition, the reporting 
of beta weights permits evaluation of the mag- 
nitude of relationship analogous to the use of effect 
sizes with regard to means. 
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CHAPTER 4 
A DETAILED ANALYSIS OF A SAMPLE STUDY 



In an effort to point up some of the criticisms 
and observations made earlier in this monograph, 
In this chapter we dissect a single study, using the 
same Instrument we used to analyze the various 
studies in TRSE, the JSSR, and SE. We discuss 
both the strengths and weaknesses of this study as 
a way of reinforcing some of the ideas we have 
presented for improving social studies research. 

The study we analyzed was chosen for several 
reasons: 

• Though the study itself was conducted quite 
some time ago, the focus of the sfdy is cur- 
rent. Critical thinking has recently reemerged 



as a priority in the social studies curriculum. In 
addition, the teaching method used in the 
study remains a major approach supported by 
critical thinking advocates. 

• The methodology used by the researchers Is 
typical of current research efforts. 

• The use of a study In which one of us was the 
lead author permits us to be more critical than 
we might otherwise choose to be. 

The Study 

The study is reproduced in its entirety below. 



THE OUTCOMES OF CURRICULUM MODIFICATIONS 
DESIGNED TO FOSTER CRITICAL THINKING* 

Norman E. Wallen, Vernon F. Haubrich, and Ian E. Reid, University of Utah** 

CRITICAL THINKING appears to be a universally accepted objective of education though we are fre- 
quently unclear as to what we mean by it and to what extent we wish to live with its consequences. As 
has been pointed out elsewhere (5), varioue definitions of critical thinking seem to encompass some or 
all of the following features: 

1. Use of scientific methods, including emphasis on evidence and the nature of hypotheses. 

2. The tendency to be Inquisitive, critical, and analytical with respect to issues, personal behavior, etc. 
A derivative of this attribute is lack of susceptibility to propaganda. 

3. Use of correct principles of logic. 

The emphasis is on the development of that elusive philosophical idea, the rational man. 

With respect to methods of fostering critical thinking, two major approaches have been advocated. 
The first is "progressive education." Critical thinking is presumed to be but one of the objectives which 
are fostered by a greater degree of self-determination, flexibility of curriculum, and freedom of behavior. 
The results of the Eight Year Study provided some support for this position. Further support of an indirect 
type is provided by studies which indicate that questioning and critical behaviors are less likely to occur 
In rigid, highly formalized situations wherein deviation i* punished (2). 

The second approach emphasizes the tools rather than the attitude of critical thinking while recogniz- 
ing the importance of a milieu conducive to the use of the tools. Thus, emphasis is placed on acquainting 
students with the principles of logic and experimentation and with their use. It is this approach toward 
which this study was directed. 



*From the Journal of Educational Research ^July-August 1963), pp. 529-534, Reprinted by permission 
of the authors. 

**At the time of the study. 
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Method 



The basic design of the study was as follows: 

It involved seven teachers of U.S. History (eleventh grade) in three Salt Lake City high schools who In- 
troduced the curriculum modifications and an additional two who served as controls. During the first year, 
one class (selected at random) taught by each of the nine teachers was tested in the fail and again in the 
spring to establish the amount of gain to be expected over a year's time under the present curriculum. 
The tests used were the Cooperative U.S. History Test, the Watson-Giaser Test of Critical Thinking, and 
the I.D.S. Critical Thinking Test. During the summer of 1960, the experimental teachers attended a one 
week workshop on the University of Utah campus under the direction of Dr. Haubrich, during which time 
they received training in the curriculum procedures and materials presently available as well as ex- 
perience in the development of new materials. During the following academic year, two of their classes 
were again tested in the fail and spring as were those of the control teachers. During this year, the staff 
members worked With the teachers in the utilization and development of materials. The resulting data per- 
mitted comparisons of gains made from year to year under the same teacher and from teacher to 
teacher within a given year. 

The statistical analysis used was analysis of covariance, which permits comparison of end-of-year 
scores -adjusteo for beginning-of-year scores under the different treatments. Thus (in effect) the mean 
gain achieved by the experimental teachers during the first year- regular curriculum -is compared with 
the mean gain achieved under the modified curriculum. Further, the mean gain achieved by the ex- 
perimental teachers using the modified curriculum is compared with the mean gain achieved by the con- 
trol teachers during the same year. 

Curriculum Modifications 

The overall plan of curriculum modification called for the teaching of a unit in "critical thinking" fol- 
lowed throughout the year by application to the content of the course as rather broadly defined. As an ex- 
ample, the students were encouraged to examine their textbook, their newspapers, and their teachers for 
examples of fallacious logic. This approach has been extensively developed in the iiiinois Curriculum 
Program under the direction of B. Othai ->l Smith and his associates. In a comprehensive application of 
the plan in Illinois, a total of 36 teachers and approximately 1,500 high school students in English 
geometry, science, and social studies classes participated. As of this writing, only a preliminary report 
has been published (5). it appears that the study was carefully conducted and that the students ex- 
periencing the experimental method showed greater gain on measures of critical thinking than the control 
group without showing impairment in mastery of course content. 

Thus, the present study is, to a large extent, a replication of the Iiiinois study to determine whether 
similar results are obtained -a procedure woefully lacking in educational research. In addition the present 
study contains some methological Improvements, notably the use of a "base line" for gauging change 
which is based on the same teachers who Institute the curriculum changes. 

For convenience, the curricular practices may be divided into (1) materials presented during the unit 
on critical thinking, and (2) materials used throughout the remainder of the year. 

1. Unit on critical thinking. This unit required approximately three weeks for ail teachers and was con- 
ducted -at the teachers' convenience -sometime during the second or third month of school. 

The sequence of presentation varied from teacher to teacher but included the following topics and in 
this general order: 

a. Definitions -abstract and concrete 

b. Logical fallacies -post hoc fallacy, etc. 

c. Deductive principles 
Syllogisms 
If-then statements 
Validity and truth 

d. Inductive principles 
The nature of evidence 

Analysis of arguments including recognition of implicit assumptions 
Reliability of sources 
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In addition to their notes and experiences during the wort shop, the teachers were provided with 
copies of Applied Logic by Little. Wilson, and Moore and copies of Guide to Char Thinking developed 
by the Illinois Curriculum Program. Also, it was Intended that each student be provided with or have ac- 
cess to A Guide to Logical Thinking by Shanner In one school, however, a misunderstanding resulted in 
these booklets not being available to all students. 

As can be seen from the topics listed above, the intent was to present to these students many o, the 
more salient development h the areas of logic, semantics, and philosophy of science, but in a fashion 
which they would comprehend. 

2 Application. Throughout the remainder of the year, the teachers attempted to utilize the Ideas and 
skills taught during the unit whenever feasible. To this end many of the exercises developed by the Illinois 
group were used Also, the teachers showed considerable ingenuity a* J expenditure of effort in materials 
which they developed, Some of the flavor of the materials may be conveyed by the following illustrative 
exercises. 

a. A statement on page 11 of the text states: 'The Articles of Confederation granted considerable 
power to a Congress of the United States." Is this definition, explanation, or opinion? What criteria are 
provided? 

b. Analyze the argument for unfair advantages of big business on page 368 of the text. Are there ir- 
relevancles? Fallacies? Do the reasons justify the conclusion? 

c. Is there a fallacy In the following argument? Life under a strong central government In Great Britain 
was tyrannical. We must not allow a strong central government to develop In this country. 

Tests Used to Evaluate Outcomes 

The measuring devices used to assess the outcomes of the program Included the Watson-Giaser Criti- 
cal Thinking Appraisal, the I D S. Critical Thinking Test, both constructed to assess skills in critical think- 
ing, and the Cooperative U.S. History Test, which was used to assess change in the more typical content 
of the course. 

1. WatsoivGlaser. This test was originally published ! n 1942 and was revised In 1956. It contains five 
sub-tests: inference, assumptions, detections, interpretation, and arguments, it has been used In 
numerous studies and is quite adequate In terms of technical considerations such as reliability, njms, 
etc. Ennls (3) has, however, questioned its validity on the giounds that some Items are questionable and 
that it gives too high a score to the "chronic doubter." 

2. I.D.S. Test This test was developed in 1957 by Ennls, in part as an attempt to overcome his objec- 
tions to the Watson-Giaser. As such the Items are, on logical grounds, superior. Preliminary data suggest 
that It Is adequate from a technical standpoint. 

3. Cooperative U.S. History Test. This test is considered to be one of the best standardized tests of 
the typical content of American History courses, it contains items designed to test knowledge of historical 
facts; understanding of cause-and-effect relationships, trends and developments; and ability to recognize 
chro;ioiogicai relationships, interpret historical maps, and locate historical information with emphasis on 
political and diplomatic history, it is somewhat weak in the area of contemporary affairs. 

Results 

Results of the analyst of covariance comparing students of the experimental teachers for the twc 
years are shown in Table 1. Table 2 shows the analysis of covariance comparing experimental ai con- 
trol classes for the second year only. Table 3 shows ihe means of the various groups as well as seme ad- 
ditional data pertaining to the I.D.S Test. Tables 4 and 5 show mean values for the Watson-Gidser and 
Cooperative U.S. History Test, respectively. These data support the following interpretation: 
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TABLE 1 

ANALYSIS OF COVARsANCE — EXPERIMENTAL TEACHERS ONLY* 













Adj. 








Source of Variance 


ix 2 


Xxy 




CU. 


& 


M.S. 


F 


P 


I.D.S. Test 


















Between years (curricula) 


4 


27 


189 


1 


159 


159 


8.83 


<.01 


Between teachers 


423 


347 


428 


6 


168 


28 


1,56 




interaction 


187 


229 


489 


6 


287 


48 


2.67 


<.05 


Residual 


8865 


5066 


10245 


406 


7350 


18 






Total 


9479 


5669 


11351 


419 











Watson-Glaser 






Between years (curricula) 


44 


58 


Between teachers 


888 


732 


Interaction 


1219 


1374 


Residual 


28042 


20298 


Total 


30193 


22463 


Cooperative U.S. history Test 




Between years (curricula) 


904 


-104 


Between teachers 


1530 


1235 


Interaction 


634 


725 


Residual 


23988 


21601 


Total 


27056 


23457 



77 


1 


15 


15 


.32 


654 


5 


59 


12 


,26 


1732 


5 


373 


75 


1.59 


31108 


348 


16414 


47 




33570 


359 








12 


1 


899 


899 


18.65 


1200 


6 


216 


36 


.75 


1083 


6 


291 


48 


1 00 


39027 


406 


19575 


48 




41322 


419 









*Wlth the exception of the F column, decimals have been omitted to simplify the tables. 
Cases were deleted at random so as to obtain samples of 20 each for each teacher for year 1 and 40 for 
each teacher for year 2. This procedure necessitated dropping the classes of one teacher from the Wat- 
son-Glaser analysis, since only 12 students took both test and re-test during year 1. 
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TABLE 2 






amaivciq nc phuadiampc 




EXPERIMENTAL VS. CONTROL TEACHERS - 




Source of Variance lx 2 ixy 




Adj. 
d.f. If 


M.S. 


F P 


I.D.S. Test 










Between groups 124 216 


375 


1 1 


I OH 


7.22 <.01 


Residual 8203 5066 


11352 


386 8?22 


21 




Total 8327 5282 


11727 


387 






watson-Glaser 










Between groups 254 206 


167 


1 24 


24 


.47 


Residual 12262 13789 


35229 


391 19790 


51 




Total 12516 13965 


35396 


392 19814 






Cooperative U.S. History Test 










Between groups 226 3 1 4 


435 


1 49 


49 


1.09 


Residual 22198 20397 


36224 


386 17481 


45 




Total 22424 P0711 


36659 


387 17530 








TABLE 3 






MEANS OF EXPERIMENTAL AND CONTROL GROUPS 


IN THE PRESENT STUDY AND OF OTHER COMPARISON GROUPS 


ON THE I.D.S. TEST 










Mean 




Mean 




N 


Fall 




Spring Gain 


Experimental Teachers - 








Regular Curriculum - Year 1 


140 


8.8 




10.2 1.4 


Experimental Teachers - 








Modified Curriculum - Year 2 


280 


9.1 




11.8 2.7 


Control Teachers - 








Regular Curriculum - Year 1 


36 


6.8 




8.4 1.6 


Control Teachers - 








Regular Curriculum - Year 2 


53 


7.5 




9.0 1.5 


Normative Data - High School Juniors* 








9.0 


Normative Data - High School Seniors* 








9.6 


College Educational Psychology Students* 








12.3 


High School Students In Courses 










Emphasizing Critical Thinking* 








12.1 


*Ennis, R.H. "interim Report: The Development of the I.D.S Critical Thinking Test." 
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TABLE 4 

MPAMC HP PYDPDIMPMTAI AMn CQMTDni rSPnilPQ 

ON THE WATSON-GLASER TEST 





N 


Mean 
Fall 


Mean 
Spring 


Gain 


Experimental Teachers - 

Regular Curriculum - Year 1 


120 


62.3 


64.9 


26 


Experimental Teachers - 

Modified Curriculum - Year 2 


240 


61.6 


64.0 


2.4 


Control Teachers - 

Regular Curriculum - Year 1 


30 


56.8 


60.0 


3.2 


Control Teachers - 

Regular Curriculum - Year 2 


53 


59.6 


62.0 


2.4 



TABLE 5 

MEANS OF EXPERIMENTAL AND CONTROL GROUPS 
ON THE COOPERATIVE U.S. HISTORY TEST 



(STANDARD SCORES: X 


= 50, S 


= 10) 






N 


Mean 
Fail 


Mean 
Spring 


Gain 


Experimental Teachers - 

Regular Curriculum - Year 1 


140 


44.1 


49.3 


5.2 


Experimental Teachers - 

Modified Curriculum - Year 2 


280 


41.3 


49.7 


8.4 


Control Teachers - 

Regular Curriculum - Year 1 


36 


44.1 


47.6 


3.5 


Control Teachers - 

Regular Curriculum - Year 2 


51 


39.5 


46.9 


7.4 



1. I.D.S. Test. 

a. Conslde»°d as a group, students of the experimental teachers showed significantly greater gain (p 
< .01) the second year- i.e., under the modified curriculum -as compared to the previous year. T he 
amount of the difference, when compared to available norms, Indicates the improvement to be of practi- 
cal Importance. The students under the revised curriculum began the year with a mean score very near 
that typical of eleventh graders and, by the end of the year, scored at a level almost up to that of a 
sample of unselected college students and almost as high as previously reported groups in high school 
classes emphasizing critical thinking. Students of these teachers but without the revised curriculum 
showed the amount of gain to be expected during the course of a year. Both groups began the year with 
nearly Identical mean scores. 

b. The significant (p < .05) teacher-by-method interaction suggests that the curricular modifications 
are more effective with some teachers than with others. 

c. When students experiencing the revised curriculum were compared with students in the regular cur- 
riculum (during the same year -different teachers), they showed significantly greater gain (p < 01), The 
gain for the students in the regular curriculum (two teachers) was almost identical for the two years. 
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it seems legitimate to conclude that the revised curriculum had a rather marked effect on critical think 
ing as measured by the i.D.S. Test. 

2. Wa*$on-Glaser Teat. 

a. The results fcr this test do not support the i.D.S. Test results. There is essentially no difference be- 
tween the two groups of students taught by the experimental teachers in amount of gain in both years 
the gain Is 2.8. The group experiencing the modified curriculum was slightly lower on the fall testing For 
! L« J year ? r ° u P\ tne 9ain is from a percentile score or 77 to 83 while for the second year qroup 
(modified curriculum), the gain is from the 74th to the 81st percentile rank based on high school norms 
Grade equivalent scores are not available for this test. 

b. The comparison of experimental and control groups during the second year on'y is consistent with 
the foregoing analysis in showing no significant difference between the groups. 

The results for this test provide no evidence for the modified curriculum. This finding is particularly dis- 
appointing in light of the fact that the Illinois study did find a significant superiority in amount of qain 
shown on this test by the students in the experimental group. 

3. Cooperative U.S. History Test. 

a. Students under the modified curriculum made significantly more gain during the year than did stu- 
dents with the same teachers during the preceding year (p < .001). in both instances, the students at the 
end of the year scored slightly below national norms. The experimental group, however scored con- 
siderably lower at the beginning of the year. 

b. The experimental group (modified curriculum) showed more gain than the control qroup durinq the 
same year, but not significantly so. 

c. The control teachers achieved significantly (p < .05) more gain the second year. 

d. The gain of the experimental teachers was not significantly greater than the gain achieved by the 
control teachers during the second year. Because of the gain achieved by the experimental teachers we 
are tempted to suggest thai the curricuiar modifications may have fostered greater interest and/or skill in 
dealing with the course content, hence, greater mastery. But since the gain was not significantly greater 
than that achieved by the control teachers during the second year, it is possible that other factors were 
operative, possibly that the second year students began the year with somewhat poorer background, it is 
clear that the modifications did not result in a decrease in the mastery of course content. 

Reactions of Teachers, Students, and Parents. An additional measure of the outcomes of a plan 
such as this is to be found in the reactions of persons involved in it. Although no systematic attempt was 
made to collect such data in the present study, some information almost inevitably is present It is recog- 
nized that impressions such as those which follow are cubject to many criticisms on the grounds of selec- 
tive sampling and bias of several kinds; they are nevertheless presented as valuable, thouqh for the most 
part subjective, data. 

1. The seven experimental teachers have ail expressed considerable enthusiasm for the program as an 
Interesting and worthwhile attempt in an important area, though some are quite skeptical as to the results 
achieved, particularly among the less able students. Even accounting for the expected desire to comfort 
the researchers and to justify their own efforts, it is our opinion that this represents an honest reaction on 
the part of the teachers. One bit of supportive data is that they have all indicated an intention to use at 
least part of the materials next year and have expressed the hope that further work of this kind wiii be un- 
dertaken. 

The consensus seems to be that the material on fallacies and definitions was easiest tc put across 
with the material on syllogisms the most difficult, as would be expected. As to organization of presenta- 
tion, some of the teachers indicated that they would prefer to spread the topics out during the year and 
Introduce them as smaller units. One teacher would, in the future, not teach the material as a distinct unit 
but rather would attempt to incorporate it throughout the course. 

2. As reported by the teachers, the reaction of students was varied. Some expressed the view that it 
was difficult. Others wondered what it was for, i.e., "Why don't we just have history?" Our expectation 
was that some students would be psychologically threatened by the material; this seems to have been 
he case but to a lesser extent that we expected. On the other hand some became intrigued and enjoyed 
it. beverai teachers reported students making use of the material in arguments and particularly in debate 



though some of the same material frequently Is presented in debate (and in psychology courses). Several 
incidents of carryover to other activities were reported: 

a. Letters were written to several advertisers and to a weather man requestiny definition of terms. The 
former were not satisfactorily answered; the latter was -and in some detail. 

b. As a result of a difference of opinion in class regarding a syllogism, several students wrote to a 
professor of philosophy at the University of Utah for clarification. 

3. There appears to have been iittie reaction from parents. As expected, some parents feared that 
knowledge of history was being sacrificed for some new silliness, but the teachers were able to provide 
an explanation which was at least in some cases considered adequate. 

We had expected some objection from parents along the lines that their children were beginning to 
question some of the eternal verities. That this did not happen may be attributable to the parents' con- 
fidence in the schools, to parental indifference, or to lack of impact of our program. 

Summary 

This report describes a two year project which introduced into three high schools a curriculum plan 
designed to foster critical thinking and which attempted to assess its effectiveness. The curriculum plan 
was patterned after a similar program developed at the University of iiiinois and consisted of the presenta- 
tion of a three week unit on the tools of logical analysis, semantics, and scientific method at a level ap- 
propriate to eleventh graders, followed by application of these tools to the content of the course in U.S. 
History throughout the year. The seven participating teachers were provided a workshop prior to the intro- 
duction of the unit and were provided the services of the project staff, as well as the benefits of several 
group discussions during the year. Their interest and effort expended in the project was such as to leave 
no question but that the approach received an adequate trial. 

The results of the evaluation demonstrate quite clearly that mastery of the typical content of the U.S. 
History course was not impaired by the curriculum modification. The effectiveness of the program in 
fostering critical thinking is not unequivocally demonstrated, since one of the tests to assess this change 
did not show any difference between experimental and control groups. The other test, however, which on 
logical grounds may be argued to be a better test, did show rather impressive differences in favor of stu- 
dents who received the revised curriculum. Further, the reactions of teachers and students, though not in- 
tensively studied, strongly support the value of the program. 
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The Analysis 

Type of Study -quasi-experimental. The re- 
searchers did have control over the treatment but 
did not use random assignment of either students 
or teachers to treatment groups. 

Justification. The authors relied on the "current 
acceptance" of its importance to justify studying 
critical thinking. While this is often done, we believe 
a reader deserves a more thorough treatment, per- 
haps something like the following: 

Many respected thinkers, including 
Dewey, Adler, Toffler, and Taba, have 
defended the necessity of students' learn- 
ing to be critical thinkers rather than pas- 
sive channels for the transmission of infor- 
mation. The rate of information generation 
Is such that no one can expect to master 
even a limited content area for more than a 
very short time, in academic areas, there- 
fore, one must learn to evaluate new infor- 
mation and to see Its relationship to pre- 
vious knowledge. In the more general 
arena of dally life, the necessity for citizens 
of a democracy to sift and evaluate com- 
peting claims for their allegiance and en- 
deavors has been recognized since the 
framing of the U.S. Constitution. 

One might also expect some rationale for the 
teaching method Involved. While Implied in the ex- 
isting report, a more explicit statement might be 
the following: 

If our definition of critical thinking Is ac- 
cepted, one teaching approach that is im- 
mediately suggested Is direct Instruction in 
the component skills (e g„ the recognition 
of logical fallacies). Each skill is presented 
to students In a manner commensurate 
with their level of knowledge; opportunities 
<o practice the skills and receive feedback 
are provided. 

Further, one might expect to find an exposition 
of the Implications of study outcomes for theory 
and practice: 

If It Is shown that the method is effec- 
tive, additional support Is provided for 
those wishing to disseminate it more wide- 
ly. Teachers and others will have reason to 
expect that the desired outcomes will, in 
fact, occur. Further, such results would 
also support the general theory, espoused 
by Bruner and others, that high school stu- 
dents are capable of learning content cus- 
tomarily taught in college. Finally, addition- 
al evidence would exist to support the 
proposition that critical thinking can be 
taught In a straightforward manner to all 



high school students in much the same 
ways as other, more typical content, rather 
than depending on greater maturity or spe- 
cial talent on the part of students or 
teachers. 

Lastly, the authors should have, at the outset, in- 
dicated that the study was a replication of other 
work and provided more details regarding the prior 
study. They could not have been expected to 
review all of the studies pertaining to critical think- 
ing prior to that time, but some additional referen- 
ces would have been helpful. In reality, the sys- 
tematic review of related literature is a distasteful 
task for many researchers -or so we believe -and 
hence is often done, as here, in a cursory fashion. 

We see no reason for concern about the ethical 
implications of the study, though the authors did 
state, near the end of their report, that they had an- 
ticipated some objections from parents because 
students were being encouraged to question com- 
monly held assumptions. Discussion of the 
philosophical/political ramifications of this issue is 
beyond the scope of a research report, but the 
authors might have explained why they had such 
expectations. 

Clarity. The focus of the study seems clear- to 
obtain evidence of the extent to which the cur- 
riculum modifications improve critical thinking in 
high school students and affect acquisition of cus- 
tomary knowledge of history. The primary outcome 
variable, "ability to think critically," Is clear at the 
outset. Other outcome variables, however, were 
not mentioned until near the end of the study. 
These variables -reactions of teachers, students, 
and parents- should have been mentioned in the 
introduction. 

In any study involving a complex treatment such 
as this, it is virtually impossible to convey alf of the 
Intricacies of the method involved. In our opinion, 
the authors presented as good a description as 
could be expected. 

Hypotheses were not stated explicitly. We would 
argue that they should have been, since the study 
was clearly intended to test the efficacy of a par- 
ticular method. The following six hypotheses were 
clearly implied, however. 

During the new curriculum year, as compared to 
the preceding year, the classes of the experimental 
teachers will demonstrate: 

1. Greater gain on the Watson-Glaser 
Test. 

2. Greater gain on the I.D.S. Test. 

3. Approximately the same amount of 
gain on the Cooperative U S History Test. 

During the same year, classes taught 
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the new curriculum (the "experimental 
group" teachers), as compared to classes 
taught the usual curriculum (the "control 
group" teachers), will demonstrate: 

4. Greater gain on the Watson-Glaser 
Test. 

5. Greater gain on the I.D.S. Test. 

6. Approximately the same amount of 
gain on the CooperatVe U.S. History Test. 

Definitions. -No specific section on definitions 
was provided. The authors did provide somewhat 
of a constitutive definition of critical thinking. 
However, the statement that various definitions en- 
compass "some or all" of these features is im- 
precise. Did the authors intend to include ail the 
features? if not, which ones? Further explication 
would have been helpful, especially of the "correct 
principles of logic." Additional clarity could easily 
have been achieved by defining critical thinking 
operationally as the scores on the Watson-Glaser 
and I.D.S. tests. The essentials of the curriculum 
modifications are probably clear "In context" later 
in the report, but might have been called to the 
reader's attention earlier. Items a-d on the bottom 
on page 38 might well have been given as the 
definition of critical thinking, since they are more 
specific both as to the intent of the curriculum and 
its content. 

Sample. The sample was clearly not obtained in 
a random manner, including as it did a total of nine 
teachers in three high schools and a total of 27 in- 
tact classes of students, ail In one particular city. 
The authors did not argue for representativeness, 
since they would have had to offer evidence that 
the teachers and students were similar to a popula- 
tion of interest in some important ways (e.g., ability 
level, socioeconomic level of the students, years of 
experience of the teachers). In fact, the mean 
scores on the Cooperative U.S. History Test (see 
page 42) suggest that the student sample was very 
similar to, but slightly below, the normative group 
for that test. The sample, then, was a convenience 
sample, with all of its inevitable limitations. 

Whether or not the authors wanted to argue for 
the generallzabillty of their results, they should 
have provided some demographic data. For ex- 
ample, the ethnic makeup of both the teacher and 
student samples can be presumed (from the loca- 
tion of the study) to be predominantly Anglo (as, in 
fact, was the case). Further, many readers would 
likely infer (again because of location) that the at- 
titudes of the teachers would be highly conserva- 
tive (though this was not the case). Since these 
variables would be expected to influence out- 
comes, some information on them should have 
been given. While the sample of students is large 
in all comparisons of interest, the sample of 



teachers is not (only seven experimental and two 
control). While actually larger than in many studies 
of this type, this sample size- particularly that of 
the control group -presents further limitations. 

Threats to Internal Validity 

History. It is always conceivable that one or 
more other factors, instead of the independent vari- 
able, may be responsible for the outcome(s) of a 
study, in this study, such factors might have in- 
cluded the availabiiity of additional resources to ex- 
perimental classes but not to control classes, a 
schoolwide disruption (i.e., a teachers' strike) 
during year one of the study, and the inroduction 
of critical thinking materials into the physical 
science curriculum during year £wo. In any study, 
one must rely on the integrity and acumen of the 
researchers to identify and discuss such factors. 
Since none were mentioned here, we can only Infer 
that none were known to the researchers. The 
study design - comparing groups both across 
years and within the same year -is probably the 
best way to rule out such possibilities, since they 
would not have been expected to favor the "new 
curriculum" group under both circumstances. 

Maturation of students would have affected all 
comparison groups in the same way, since the 
pre/post interval was the same for all. Maturation of 
teachers might have accounted for the superiority 
of year two over year one results if the teachers 
were relatively inexperienced (this was not the 
case), but would have been unlikely to have ac- 
counted for differences In year two aione. 

Mortality in students would not have b6en ex- 
pected to favor the new curriculum groups, since It 
occurred either by absence from class or by ran- 
dom deletion. 

Subject characteristics are always of concern 
when random assignment of sizable numbers of 
subjects is not used. Analysis of covariance and 
similar techniques (e.g., analysis of regressed gain 
scores) do make it possible to match groups with 
respect to measured variables (in .his case, pretest 
scores), but cannot ensure comparability on other 
variables, such as student attitude toward social 
studies or interest In analytic processes. Further, 
such analyses make mathematical assumptions 
(such as how to determine the "best fit" line), 
which are themselves subject to sampling error. 

In tills study, the researchers should have iden- 
tified those subject variables that were likely (1) to 
affect the outcome variable(s) and (2) to be dif- 
ferent for the comparison group. They should then 
have attempted to measure these variables and in- 
corporate them into the analysis. That this is easier 
said than done is illustrated by the difficulty of get- 
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ting socioeconomic data (probably one of the most 
important variables to control). 

In a methods study such as this, researcher* 
must also be concerned about possible differences 
betwet teachers of the two groups -perhaps the 
experimental teachers were Just better teachers 
than the control teachers. Use of the same 
teachers for both methods -as in part of this 
design- Is the best way to control for this threat. 

Pretesting should not have given an advantage 
to the new curriculum group, since it was done in 
ail groups. One might argue that the pretest inter- 
acted with the method to result in an advantage to 
the experimental group, but this seems unlikely, in 
that the pretest items were only a sample of the 
tasks emphasized ail year long. Omission of the 
pretest would have eliminated this possibility at the 
sacrifice of statistical matching of groups. 

A regression effect is unlikely, since extreme 
groups were not used, if anything, such an effect 
would favor the control group during year two, 
since it had lower pretest scores. 

An instrumentation effect resulting in bias seems 
unlikely, since instrumentation was the same for ail 
groups. It is conceivable that the new curriculum 
students might do more poorly on the posttest be- 
cause of increased critical ability, but this would be 
contrary to the hypotheses and the outcomes ob- 
tained, it seems unlikely that bias would be intro- 
duced by test scoring, since ail tests were machine- 
scored, information on test administration should 
have been included, however. Administration of 
tests by teachers is notorious for violations of 
standard testing procedures. Had this been the 
case, one might suspect the experimental teachers 
of giving assistance or additional time in taking the 
tests. This, of course, would have favored their stu- 
dents, in actuality, this threat was eliminated, since 
project-trained assistants administered ail tests. 

A Hawthorne effect was a major concern in this 
study. Since both teachers and students knew that 
they were part of a special project and since the ex- 
perimental teachers received special summer train- 
ing, it could be argued that this special attention ac- 
counted for the results obtained. The only way to 
control for this threat would have been to provide 
similar special attention to the control groups. 

An order effect would not apply to the students 
In this study, but it might be thought to have af- 
fected teachers, since they were involved during 
successive years, it seems unlikely, however, that 
second-year gains were larger because of first-year 
participation, since the first year consisted only of 
testing and organizational meetings. 

The authors of the study did pay some attention 
to these threats, although in a rather general way. 



They did state that the design of the study per- 
mitted analysis of gains made from year to year by 
the same teacher and from teacher to teacher 
within a given year, but they did not indicate what 
specific threats were addresssed by this design 
They also described how analysis of covariance 
matched the groups on pretest scores, but again 
they did not indicate what threat this controlled. 
Beyond these statements, there was no discussion 
of internal validity. 

As to whether the treatment received an ade- 
quate trial, no in-class observation was reported. 
The statement that the project staff met periodically 
with the teachers throughout the year, however, 
combined with the examples of assignments 
developed and of teacher and student reactions 
lead us to conclude that the tryout was adequate 
in terms of substance. One year appears to be 
ample time for impiementation of the curriculum. 

Reliability of Instruments. The authors did a 
poor job of addressing reliability. They are guilty of 
the typical "quick shuffle" in stating that usage and 
"other evidence" were sufficient. They should, at 
the very least, have reviewed previous evidence as 
to type of reliability and the magnitude of reliability 
coefficients and then assessed their applicability to 
this study. Since the student sample appeared to 
be quite similar in performance on these tests to 
available norm groups, prior data might have been 
applicable. However, there is still no excuse for not 
reporting internal consistency coefficients, since 
they could easily have been obtained from the data 
available. While pre/post correlations are somewhat 
misleading as indicators of reliability in a treatment 
study (since one expects inconsistency pre to 
post), they are nevertheless of interest, particularly 
for comparison among the groups, if the new cur- 
riculum turned out to be effective, one might ex- 
pect less pre/post consistency for students ex- 
posed to this curriculum, since new treatments are, 
by their nature, trying to disturb the predictable pat- 
tern of development. 

Validity of Instruments. The authors provided a 
brief logical analysis of the two critical thinking 
tests. They did not, however, discuss these tests in 
relation to the curriculum modifications introduced 
in the study. Readers can make their own com- 
parisons between the five subtests of the Watson- 
Giaser test and the outline of curriculum topics but 
they should not have to do so. it appears that all 
five subtests have logical relevance to the cur- 
riculum topics, but that two topics (definitions and 
reliability of sources) may not have been tested. 
The authors had a responsibility to defend their 
use of this test as it relates to the content taught. 

Even less Information was provided on the 
validity of the i.D.S. test. While the use of inde- 
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pendent judges to assess the validity of these tests 
for the purposes of this study may be less crucial 
than In many studies reviewed in this monograph, 
li would have greatly strengthened the authors' 
report. 

Finally, the authors neglected to report a very 
useful piece of information. They had a built-in em- 
pirical check on validity -the correlation between 
the two tests -which could easily have been ob- 
tained from the data at hand. It would be very help- 
ful to have this correlation (both pre and post) 
separately for each major treatment group. The 
results of the group comparisons did suggest that 
these correlations were not high, but the details are 
important. 

External Validity 

Population Generalizability. To their credit, the 
authors did not specifically overgeneraiize their 
results to "teachers" and "students," but rather 
phrased both their discussion of results and their 
summary In terms of the outcomes obtained for 
the teachers and students involved In the study. 
They failed to discuss the serious limitations im- 
posed by their convenience sample, however. Also, 
their use of inferential statistics without qualification 
implies, we believe, that they thought their results 
were generallzable. They did mention that the 
study was a partial replication and that the repli- 
cated data did not support the previous findings, 
but we would argue that they should have included 
a statement somewhat like the following at the end 
of their summary: 

In total, our evidence indicates that fur- 
ther use and study of this method are war- 
ranted. We found no evidence of negative 
effects and some evidence of positive im- 
pact. Since, however, our results were 
equivocal and, In specifics, inconsistent 
with a prior study, and since our sample 
does not permit generalization to a defined 
population, these results must be treated 
as tentative. 

Ecological Generalizability. The authors made 
no comments about the ecological generalizability 
of this study. They did not commit the (not uncom- 
mon) error of recommending this method in ail so- 
cial studies courses or at a variety of grade levels 
or in the absence of a university support system, 
but neither did they warn against such over- 
generalization. 

Results and Interpretations. The authors 
generally maintained a clear distinction between 
results and interpretations. When presenting the 
results for the U.S. history test, an interpretation 
was made, but it is clear that it was an inference 
going beyond the data. One might, however, ques- 



tion the inclusion of the section on reactions of 
teachers, students, and parents in ihe "Results" 
section, since some of the interpretations made 
there are not supported by the data presented. 
Overall, the data did, in our opinion, justify the inter- 
pretations made. 

Data Analysis 

Descriptive Statistics. The statistics presented 
are appropriate, although the omission of standard 
deviations is unfortunate. Standard deviations are 
important, in that they permit the assessment of 
the magnitude of the differences in mean gain. This 
is particularly important when inferential statistics 
are suspect (see below). Effect size also should 
have been reported. 

Another method of judging the magnitude of 
change Is by comparison with other groups. As the 
authors pointed out, the year one and year two ex- 
perimental teacher groups began at very near the 
expected mean. While the year one group gained 
somewhat more than might be expecfed from nor- 
mative data, the additional gain of the year two 
group (to that attained by "special groups") does 
seem sufficient to warrant the conclusions drawn. 

More serious is the failure of the authors to ex- 
plore and discuss the teacher-by-method Interac- 
tion. This is a good example of both a preoccupa- 
tion with hypothesis-testing and the political Im- 
plications of research. The researchers felt justified 
in not reporting interaction data, since it was not 
primary to the purpose and implied hypotheses of 
the study. In actuality, however, means were ex- 
amined and the results found to be somewhat dis- 
concerting. The teacher who had been most en- 
thusiastic and hardworking with respect to the new 
curriculum obtained the lowest gains during year 
two, while the teacher judged to have the poorest 
grasp of the content showed the highest gains. 
These findings were not only repugnant to the re- 
searchers, but also potentially destructive to the 
morale of the teachers in the experimental group. 
Consequently, nothing was said. We believe it is 
not uncommon in educational research to ignore 
awkward results because they are inconsistent with 
the current presumed state of knowledge and/or 
have implications that are potentially detrimental to 
interpersonal relationships. This situation is exacer- 
bated by the all-too-common practice of terminat- 
ing inquiry into a topic when a research grant runs 
out. In this instance, foiiow-up examination of this 
unexpected and distasteful outcome might have 
led to some truly significant findings. With the 
hindsight of twenty years, it now appears that the 
explanation for this result could be found in the 
quality of interpersonal relations between teacher 
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and students. Unfortunately, no data were col- 
lected on this variable. 

Inferential Statistics, Anaiysis of covarlance Is an 
appropriate procedure for this study. Since the as- 
sumption of random sampling was violated, 
however, the authors were obligated to indicaK 
that the resulting probabilities were not exact and 
should be interpreted only as general Indications. 
This they did not do. it is legitimate to use the prob- 
abilities as indicators of greater gain on the LD.S. 
test than on the Watson-Giaser, although the com- 
parison of means (see Tables 3 and 4) makes the 



same point. What is not defensible (although com- 
mon) was the reporting and interpreting of prob- 
abilities as though they could be taken at face 
value. 

Significance of the Study. Despite the many 
criticisms we have made of the study, we judge it 
to be significant. This judgment reflects our ap- 
praisal of the importance of the topic and the 
realization that no study is perfect. Nevertheless, 
our anaiysis reveals that even experienced re- 
searchers can substantially improve the quality of 
their research and the reporting of their findings. 
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CHAPTER 5 

SUGGESTIONS FOR CLASSROOM RESEARCH 
IN THE SOCIAL STUDIES 



The profession has largely overlooked one 
group of individuals as not only a potential source 
of valuable information about social studies, but 
also as potential gatherers of such information We 
refer to those most intimately involved with social 
studies subject matter, methodology, and class- 
rooms-classroom teachers of social studies. 

It appears to be a fact that most classroom 
teachers of the social studies do not engage in re- 
search. We found no reports of research efforts by 
classroom teachers In TRSE or the JSSR for the 
past ten years; only an occasional article by 
teachers can be found In the research section of 
SE during this same period. Similarly, recent 
reviews of research In social studies education 
reveal few studies by classroom teachers (e.g., 
Hunkinset al. 1977; Stanley 1985; Armento 1986). 

Although it is only logical to assume that most 
social studies teachers want to improve the quality 
of their instructional efforts and thus probably ex- 
periment with new materials and approaches from 
time to time, there seems to be little desire on their 
part to engage in systematic research in their class- 
rooms or to consider research as a source of ideas 
about possible ways to improve their efforts. We 
think this is unfortunate. 

This is not, of course, a new idea (see, for ex- 
ample, Shaver 1979b; Wiley 1977). The intent of 
this section is therefore not to analyze in depth 
why social studies teachers neither engage in nor 
read research. Let us just state briefly that they 
aro not trained to do so in their social studies or 
general methodology courses; they are not en- 
couraged to do so by their supervisors or ad- 
ministrators; thev pre not In any way rewarded for 
doing so; and they are further discouraged from 
such actMty by the large numbers of students 
(often between 30 and 40 students per class) that 
most must teach. Even those few who read the re- 
search literature rarely find anything that, in their 
perception, relates directly to what they do in their 
own classrooms 

Classroom teachers could investigate many 
kinds of questions in social studies education In- 
deed, by doing so they could perform a vital ser- 
vice to the profession. It is an unfortunate fact that 
we still have only the haziest of ideas as to what 
sorts of content, methods, learning activities, teach- 
ing strategies, and evaluation dev^es make much, 
if any, sort of a difference in social studies class- 
rooms; how students "learn" social studies most ef- 
fectively; what methods work best in what sorts of 



situations; how to encourage and develop student 
thinking about social issues; how to vary content, 
methods, and activities to help students of differing 
abilities; how best to sequence content so as to 
maximize understanding; or even (alas!) how to in- 
crease the interest of the vast majority of students 
in the soJal studies curriculum itself. 

Classroom teachers can help to provide some 
answers tc these (and other) important questions, 
in fact, if several teachers, in different schools 
within districts and even in different districts 
throughout the nation, were to investigate the 
same question in their classrooms, thereby replicat- 
ing the research of their peers, they could begin to 
establish what might become a steadily accumulat- 
ing base of knowledge about Important aspects of 
teaching and learning in the social studies. As we 
indicated earlier, such a knowledge base, thcugh 
badly needed, does not at present exist. 

We believe there is another important reason 
why teachers and other schuc! personnel might 
profitably conduct research-as a means of reduc- 
ing burnout. In our experience, many teachers find 
it difficult to maintain enthusiasm for their work 
after several years of coping with all the stresses of 
the profession. Participation in research to clarify 
questions of interest and concern might be one of 
the best ways to maintain the intellectual excite- 
ment that, for many, has been lost. 

in this chapter, therefore, we suggest 
methodologies classroom teachers could use to in- 
vestigate questions of interest and then describe 
how a classroom teacher might use them to inves- 
tigate one or more questions of Interest. The techni- 
ques we suggest are designed not to be too 
demanding of their time and energy. The 
methodologies hold promise for providing informa- 
tion of interest and value not only to individual 
teachers but to the profession as a wh' ), 

in the examples that follow, we use the dis- 
cipline of history as the source for the research 
questions we present. Similar examples could be 
presented using other disciplines (political science, 
economics, etc.) or such topics as global educa- 
tion or law studies, which typically borrow informa- 
tion from a variety of disciplines. 

Experimental Research 

Suppose that a history teacher is interested in 
the following question: "How can I most effectively 
teach historical concepts to my students?" The 
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teacher might compare the effectiveness of certain 
methods of instruction (e.g., Inquiry, case studies, 
illustrated lectures, programmed units, small group 
discussions) with others In promoting the learning 
of historical concepts. If conditions permit ade- 
quate controls, experimental research would be an 
appropriate methodology. Students could be sys- 
tematically assigned to contrasting forms of history 
Instruction. The effects of these contrasting 
methods could then be compared by testing the 
conceptual knowledge of those taught. Student 
learning could be assesed by an objective test, 
with the validity of the test checked in some way. 
The scores on the test (the dependent, or out- 
come, variable), if they differ, would give us some 
idea cf the effectiveness of the two methods. 

In the simplest sort of experiment where there 
are two contrasting methods to be compared 
(usually referred to as the independent variable), 
an attempt is rrude to control for all other (ex- 
traneous) variables, sucn as student ability level, 
age, grade level, time, materials, teacher charac- 
teristics, etc., that might affect the outcome under 
investigation. Methods of control could Include ran- 
domly assigning students to one or the other of the 
instructional groups, holding the classes during the 
same or closely related periods of time, using the 
same materials in both groups, comparing stu- 
dents of the same age and grade level, etc. 4 

If possible, of course, one wants to hrve as 
much control as possible over the assignment of in- 
dividuals to the various treatment groups. As we 
mentioned In Chapter 3, however, random assign- 
ment of students to treatment groups Is usually dif- 
ficult, if not impossible, to achieve. Nevertheless, 
comparisons are still possible. For example, 
achievement In two or more intact history classes 
In the same school, taught by teachers whose 
methods differ rather dramatically (predominantly 
lecture-oriented vs. discussion-oriented teachers, 
for example), might be compared. Since the stu- 
dents In these classes would not have been as- 
signed to their classes randomly, this could not be 
considered a "true" experiment. Large differences 
between the classes, however, could still be sug- 
gestive of how the two methods compare. 5 Further- 
more, it might be possible to compare groups that 
are matched on important variables -at least on a 
pretest. 

Consider for a moment the study we analyzed in 
Chapter 4. With certain modifications, such a study 
could be carried out by any interested classroom 
teacher. Granted, the curriculum modifications 
were complex and required training, but any 
method a teacher wished to study could be sub- 
stituted. A minimum of two classes is required; 
secondary teachers often teach several sections of 
the same course. Elementary teachers would need 



to divide their class randomly, compare succeed- 
ing classes (by semesters or years), or involve a 
second teacher. Obtaining the tests used, or 
others, should not be a problem. Data collection is 
a simple matter. Data analysis using gain scores 
(from pre to post) for each ctudent is a straightfor- 
ward and relatively simple process involving only 
means, medians, standard deviations, and frequen- 
cy polygons. We believe that the mechanics of car- 
rying out such a study, therefore, are by no means 
prohibitive. 

The difficult, but also interesting, part is attend- 
ing to the various issues we have discussed herein 
so as to arrive at legitimate and useful interpreta- 
tions. Such efforts might well make truly significant 
contributions to the education of children and 
would go a long way toward reprofessionalizing 
teaching. We are well aware of the potential for er- 
roneous conclusions on the part of individual 
teachers, but we believe that this can be 
counteracted by the insistence that intentions, 
plans, methods, and findings be shared with col- 
leagues. Good research procedures, once demys- 
tified, are well within the grasp of most school per- 
sonnel. Lastly, we think the probable benefits of 
our recommendations far outweigh any potential 
for misunderstanding. 

Si« r vey Research 

Another teacher might not be interred in com- 
paring instructional methods. He or she might say, 
Tm more interested in the general feelings my stu- 
dents have about history. What do they like about 
their history classes? What do they dislike? Why? 
What types of history are liked the best or least? 
How do the feelings of students of different ages, 
sexes, and ethnicity in our school compare? in our 
district?" 

These sorts of questions can best be answered 
through a variety of survey techniques that 
measure student attitudes t ward their history clas- 
ses. Questionnaires or interview schedules would 
need to be prepared and their validity and reliability 
checked in some way; the instruments could then 
be given to students, teachers, counselors, or 
other appropriate individuals to complete. 

The difficulties involved In survey research are 
mainly twofold: (1) insuring that the questions to 
be answered are clear and not misleading (this can 
be accomplished, to a fair extent, by using objec- 
tive or "closed-ended" questions, insuring that they 
all pertain to the topic under investigation, and 
then further eliminating ambiguity by a small pilot 
testing of a draft of the questionnaire); and (2) get- 
ting a sufficient number of the questionnaires com- 
pleted and r eturned from the intended group so 
that meaningful analyses can be made (this can be 
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furthered by a second, and sometimes third, ad- 
ministration of the questionnaire to non-returnees). 
The big advantage of questionnaire research is that 
it can provide a lot of information from quite a 
large sample of individuals, if more details about 
particular questions are desired, however, a 
teacher can also conduct personal interviews with 
students. The advantage of an interview (over a 
questionnaire) is that open-ended questions (i.e., 
those requiring a written response of some length) 
can be used with greater confidence, particular 
questions of special interest or value can be pur- 
sued in depth, follow-up questions can be asked, 
items that are unclear can be explained, etc. Care 
must be taken not to forget, however, that data ob- 
tained through surveys is only a description of 
what Is and not necessarily what should be. Survey 
results can, however, suggest possible hypotheses 
to investigate using some of the other methods 
described in this chapter. 6 

Content Analysis 

Yet another teacher might be interested in the 
accuracy of the images or conceptions presented 
to students in their history textbooks. She or he 
might ask, "is the content (written or visual) 
presented to students in history texts biased in any 
way, and if so, how?" Answerii.y this question calls 
for a content anaiysis. A content analysis is just 
what its name implies -an anaiysis of the written or 
visual contents of a document. A person's or 
group's conscious and unconscious beliefs, at- 
titudes, values, and ideas are often revealed in the 
things they write (draw, paint, etc.) In magazines, 
iewpapers, novels, plays, advertisements, and 
books. Since history textbooks are comprised 
primarily of written material, this material can be 
analyzed in any one of a number of woys. To 
analyze the contents of a textbook (or textbooks), 
however, a teacher first needs to plan how to 
select and order the contents that are available for 
anaiysis. Pertinent categories must be developed 
so the teacher can identify and then count and 
compare that which he or she thinks is important. 

This is the nub of content anaiysis -defining as 
precisely as possible those aspects of the content 
the teacher wants to Investigate and then formulat- 
ing relevant categories that are so explicit that 
another teacher who uses them to examine the 
same material would find essentially the same 
proportion of topics emphasized or ignored. 

Suppose, for example, that a teacher is inter- 
ested in the sorts of heroes being presented to stu- 
dents in various history textbooks. He or she 
would first select the sample of texts to be 
analyzed -that is, which textbooks he or she 
would read, on what subject(s), covering what time 



period, and which editions (e.g., current U.S. his- 
tory texts available for use in his or her district). 
Categories could then be formulated. Possibilities 
might include the physical, emotional, and social 
characteristics of heroes; these couid in turn be 
broken down into even smaller coding units, such 
as the following- 
Physical Emotional Social 



hair color 
eye color 
age 
etc. 



warm 
aloof 
hostile 
etc. 



race 
religion 
occupation 
etc. 



A coding sheet would then be prepared to tally the 
data in each of the categories as it is identified in 
each text. Comparisons couid then readily be 
made. 

A major advantage of content anaiysis Is that it 
is unobtrusive. The teacher can "observe" without 
being observed, since the "contents" being 
analyzed are not Influenced by the teacher's 
presence, information that might be difficult or 
even impossible to obtain through direct observa- 
tion or other means can be gained through 
anaiysis of textbooks and other available com- 
munication material without the author or publisher 
realizing that it is being examined. Furthermore, 
replication of a content analysis by another teacher 
is relatively easy. Thirdly, the information obtained 
through a content analysis of textbooks can be 
very helpful in planning for instruction. Such infor- 
mation can suggest additional data students need 
to get a more accurate and complete picture of the 
world in which they live, the factors and forces 
within it, and how these factors and forces Impingo 
on people's lives. 

Coi regional Research 

A tp^cher might ask, "How can we predict 
which sorts of individuals are likely to have trouble 
learning historical subject matter?" If we could 
make fairly accurate predictions In this regard, then 
perhaps we couid suggest some corrective 
measures teachers couid employ to help such In- 
dividuals, avoiding production of "history-haters." 
in this instance, correlational research may be the 
most appropriate methodology. An interested 
teacher couid use a variety of measures to collect 
different sorts of data on students, including o.eir 
performance on a number of tasks related to his- 
tory learning (e.g., reading historical accounts, 
utilizing maps), their demographic characteristics, 
aspects of their backgrounds, their early experien- 
ces with history courses and history teachers, the 
kinds of history courses they have taken, and any- 
thing else that might conceivably point up how 
those students who do well (learn history) differ 
from those who do poorly. 
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The teacher might thei, look for patterns of 
some sort in each group of students (those who 
learn easily and those who have difficulty). What 
dc those who iearn history easily seem to have in 
common? What do they seem to be doing that 
those who have trouble learning history seem to ig- 
nore or avoid? What do they apparently not do? 

The Information obtained from such research 
can help a teacher predict more accurately the 
likelihood of learning difficulties for cergin types of 
students in history courses and even, pf tiaps, sug- 
gest some things to try with different gro jps of stu- 
dents to help mam learn. 

In short, correlational research seeks to inves- 
tigate whetiier one or more relationships of some 
type exist. The approach requires no nipuiation 
or intervention on the part of the teacher other 
than that required to administer the instrument^) 
necessary to collect the data desired. In general, 
this type of research would be undertaken when a 
teacher wants to iook for and describe relation- 
ships that may exist among naturally occurr:ng 
pheno^ona, without trying in any way to alter 
these phenomena. 

Causal-Comparative research 

A teacher might pursue additional sorts of inves- 
tigations In vvhich the variables involved cannot be 
manipulated. For example, a teacher 9 might be in- 
terested in discovering whether high school stu- 
dents enrolled in a college-bound curriculum feel 
differently about history than high school students 
enrolled in a non-coliege-bound curriculum, if this 
question were to be investigated experimentally, 
students would have to be randomly assigned into 
college-bound and non-college-bound curricula, 
and then their attitudes compared by moans of one 
or more assessment devices. Conceptually, of 
course, this Is possible but actually it would be im- 
possible to do. 

To test this question using a causal-comparative 
design, however, two groups of students who al- 
ready exist, one enrolled In a college-bound 
program and the second in a non-coiiege- bound 
program, could be compared to see if they differ in 
their feelings toward history. Suppose they do. Can 
the teacher conclude that the difference In cur- 
rlcu'a produced the difference in feelings? Alas, no. 
The teacher can only ccnclude that a relationship 
on some sort exists, but he or she cannot say what 
caused the relationship. 

Thus, interpretations of causal-comparative re- 
search are limited because the teacher cannot say 
whether a particular variable is a cause or a result 
of the behavior(s) observed. In the example 
presented here, he or she would not know If any 
perceived differences in feelings between me two 



groups were due to the enrollment in the different 
type of curricula; if enroi. nt in the different cur- 
ricula was due to a difference in attitude between 
the two groups; of if some third, unidentified, factor 
was at work. 

Despite problems of interpretation, causai-com- 
parative studies are of value in identifying possible 
causes of observed variations in the behavior pat- 
terns of students. These possible causes can then 
be investigated using experimental or other 
methods of research. Furthermore, additional infor- 
mation can sometimes strpngthen the argument for 
causation, as in the research linking cigarette smok- 
ing and cancer. 

Ethnographic and Case Study Research 

In all of the examples so far presented, the ques- 
tions being asked Involve how well or how much 
or how accurately social studies learnings or at- 
titudes or ideas exist or are being developed. Thus, 
possibh avenues of research include comparisons 
between alternative methods of Caching social 
studies (using history as an example), surveying dif- 
ferent groups of social studies students or social 
studies professionals (teachers, supervisors, etc.), 
or analyzing different social studies texts. 

Quite another type of question can be asked 
about the teaching and learning of social studies, 
however. A teacher might be interested In knowing 
not how much or how well or how accurately, but 
simply "how." in the case of history, just how do 
history teachers teach their subject? What kinds of 
things o they do as they go about their daily 
routine? What sorts of things (jo students do? in 
what kinds of activities do they engage? What ex- 
plicit and implicit rules of the game in history class- 
es seem to help or hinder the process of learning? 

To gain some insight into these concerns, an 
ethnographic methodology can be utilized. A 
teacher who wishes to further his or her under- 
standing of how history Is actually taught would try 
to document or portray the everyday experienoes 
of students (and teachers, if possible) in histcy 
classrooms. The focus would be on only one stu- 
dent or one classroom (or a small number of them 
at most). The teacher would observe the student or 
the classroom on as regular a basis as possible 
(perhaps during preparation period) and attempt to 
describe, as fully and as richly as possible, what he 
or she sees going on. Descriptions (a better word 
might be "portrayals") might depict the social at- 
mosphere of the classroom; the intellectual and 
emotional experiences of students; the manner in 
which the teacher (student) acts toward and reacts 
to (other) students of different ethnicities, sexes, or 
abilities; how the "rules" of the classroom are 
learned, modified, and enforced; the kinds of ques- 
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tions the teacher (and students) ask; and so forth. 
The data to be collected might Include detailed 
prose descriptions written on legal-sized tablets by 
the teacher/observer, audiotapes of pupil-student 
conferences, videotapes of classroom discussions, 
examples of teacher lesson plans and student 
work, soclograms depicting "power" relationships 
In a classroom, and flowcharts illustrating the direc- 
tion and frequency of certain types of comments 
(e.g., the kinds of questions asked by teacher and 
students to one another and the responses dif- 
ferent kinds produce). 10 

Ethnographic or case study research also lends 
itself well to a detailed study of individuals. Some- 
times much can be learned from studying just one 
Individual. For example, some students iearn his- 
tory very easily. In hopes of gaining Insight into 
why this Is the case, a teacher might observe one 
such student on a regular basis to see If there are 
any noticeable patterns of regularities In the stu- 
dent's behavior. The student's teachers (coun- 
selors, coaches, etc.), as well as the student, might 
be Interviewed In depth. A similar series of observa- 
tions and Interviews might be conducted with a stu- 
dent who finds history very difficult to learn. As 
much Information as possible (study style, attitudes 
toward history, approach to the subject, behavior 
In class, etc.) would be collected. Through the 
study of a somewhat unique Individual, Insights 
might be gained that would help the teacher with 
similar students In the future. 

In short, then, the goal of a teacher engaging In 
ethnographic or case study research Is to "paint a 
portrait" of a history (or any social studies) class- 
room (or an Individual) In as thorough, accurate, 
and vivid a manner as possible so that others can 
also "see" that classroom, Its participants, and 
what they do. Indeed, ethnographic research 
seems e particularly viable approach for use In 
classrooms. Teachers contemplating using this 
methodology should keep In mind the cautions 
and recommendations we made In Chapter 3 and 
consult one or more of the sources we mentioned 
on page 29. 

Classroom teachers can (and should, we would 
argue) parv.clpate !n this research endeavor. There 
Is so rtiuch In our fiord about which we know so lit- 
tle. So many questions remain jnanswered. So 
much Information Is needed. 

A Final Word 

We recognize that our suggestions do not easily 
fit into the typical dally activities of most teachers 
(or other personnel). We acknowledge also that car- 
rying out such efforts requires additional time and 
energy, which -given the demands on teachers - 
may seem excessive. However, we know of 



teachers who have, despite their obligations, found 
It possible to carry out such studies -Including 
quasi-experimental research. They teii us that the 
effort requ/ed was more than compensated for by 
the Information gained and the intellectual stimula- 
tion provided by the process. Thus, we are en- 
couraged to commend such endeavors to others. 

Notes 

1. For a classroom teacher's analysis of why so 
few of her peers do research (but also why she 
thinks they should), see McKee (1986). 

2. For some further thoughts as to why class- 
room teachers do not engage in research, see 
McPhie (1979). 

3. For some thoughts and data on student Inter- 
est in the social studies, see Schug, et al. (1984) 
and Shaughnessy and Haladyna (1985). For a 
response to Schug, et al., see Allen (1984). 

4. A basic, but clear discussion of experimental 
research In the classroom can be found In Fer- 
guson (1986). Examples of experimental research 
In social studies education include Gilmore and 
McKlnney (1986); Kieg, Karabinus, and Carter 
(1986); foho (1986); and Betres, Zajano, and 
Gumlen.ak (1984). 

5. An extremely thorough treatment of this type 
of research can be found In Cook and Campbell 
(1979). Examples of quasi-experimental research In 
social studies education Include Beem and Brug- 
man (1986); Barnes and Curlette (1985); Hahn and 
Avery (1985); and McKenzle and Sawyer (1986). 
For more details and examples of how to do quasi- 
experimental research In social studies classrooms, 
see Shaver (1979a). 

6. Some helpful Ideas about survey research 
can be found In Smith (1986). Examples of ques- 
tionnaire- or Iniervlew-type survey research In so- 
cial studies education Include Bennett (1984); 
Jantz, et. al. (1985); LeSourd (1984); and Schug 
and Blrkey (1985). 

7. A good beginning reference for content 
analysis research Is Wiseman and Aron (1970). Ex- 
amples of content analysis research in social 
studies education include Anyon (1978); Stanley 
(1984); Hahn and Blankenshlp (1983); Romanlsh 
(1983), and Saltonstall (1979). 

8. Examples of correlational research h social 
studies education include Curtis (1 983) and 
Haladyna, Shaughnessy, and Redsun (1982). 

9. Although we continue to refer to only one 
teacher in these research examples, we would like 
to stress that more than one teacher might be In 
volved in a research endeavor in social studies 
education. Two or more teachers, acting as a re- 
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search team, for example, might decide to conduct in White (1986). Examples of case study or eth- 

research in their classrooms or with stuoents. nographic research in social studies education in- 

10. A clearly written introduction to ethnographic c!uds Ad!sr ( 1Q84 >: D!em (1986); Goodman and 

research for social studies teachers can be found Adler (1985); and Levstik (1986). 
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